How to combine two files in Alteryx - alteryx

i am learning Alteryx and have ran into my first issue. I have an excel file that i am using as one source. The files has two sheets with the same data, but the second sheet does not have headers.
I wanted to see if there was a way to combine the two sheets into one, within Alteryx using column position instead of headers since the second does not have them. Any help is very much appreciated.

Yes, both their Join (https://help.alteryx.com/20213/designer/join-tool) and Union (https://help.alteryx.com/20213/designer/union-tool) tools have a "Record Position" option which is exactly what you're requesting. See the links for details.

You have to input the file twice, once for each sheet.
For the 2nd sheet make sure to click on the option that the first row contains Data
Then you can use the Union tool --> Auto Config by position --> Set a specific order (Check). See image links below.
First Row Contains Data
Union Tool Configuration
Sheet 1 Example Input
Sheet 2 Example Input
Output

Related

Limit with creating a drop-down list dependent on a selection in excel

I have an excel file with two sheets. The second sheet (Report) contains data validation cells based on the first sheet (Data). From the second sheet, the drop-down list that displays in the Select XXX depends on the selection in the Generate Report. When the Generate Report is set to anything beyond the first five in its list, the "Select XXX" displays year as a default list (no problem with this) via the code ...INDIRECT("Year").... The problem is that excel does not allow for addition of more code (seems I hit the limit). The question is - how can I manipulate this code to accommodate every option in the Generate Report? or perhaps, is there another method to implement?
The data validation source code for the drop-down list is =IF($B$4=Data!$Q$5,INDIRECT("Client"), IF($B$4=Data!$Q$6,INDIRECT("Month"), IF($B$4=Data!$Q$7,INDIRECT("Product_Service"), IF($B$4=Data!$Q$8,INDIRECT("Sector"), IF($B$4=Data!$Q$9,INDIRECT("Trans_Type"),INDIRECT("Year"))))))
Please, see the sample file at https://drive.google.com/file/d/1VKkGHjlJzLQqx4J9kyd_bCKG4r0Q7HkG/view?usp=sharing
What you could do is put the range names in column R, and VLOOKUP them:
=IFERROR(INDIRECT(VLOOKUP($B$4,Data!$Q$5:$R$9,2,FALSE)),INDIRECT("Year"))
You could then have as many item lists as you wish.

Power Query reference certain row

I am doing a query from a folder with many Excel files which all have the same structure. I want to reference a certain row which is always in the same place (row no. 5) in the same sheet in all of the excel files.
How can I do that? There is no reference point like a certain word that I could filter for, I just need row no. 5. The row sometimes is empty, partially filled or completely filled in. I need it in all 3 states.
Can anyone help me?
Thanks!
= Table.FromRecords(List.Transform(Folder.Files("YourFolderPath")[Content],each Excel.Workbook(_){0}[Data]{4}))
Maybe this approach will help.
I'm assuming you selected your query's source folder path and clicked Combine & Edit...
and picked a Sample file Parameter sheet and clicked OK to combine files...
So you'd see your appended worksheets result...something like this below. Note that my different workbooks in this example contain your different conditions for the 5th rows--empty, partially or completely filled.
All I think you really need to do from here is to add an index in the "Transform Sample File from..." query--in this example, it is the "Transform Sample File from Test" query since Test is what my folder was named and therefore what my query name was defaulted as.
Just select the "Transform Sample File from..." query, then Add Column > Index Column.
Then, when you click back on your main query...in this example...Test, you will see the index numbers, which you can use along with the Source.Name values for easy reference.
For instance, you could filter for Index value 4 (rows 5 of the worksheets) to see:

Kettle (spoon) - get filename for excel output from field in the excel field in input

I'm trying to process an excel , I need to generate una excel file for each row and as filename I need to use one of the fields in the row.
The excel output hasn't the option "Accept filename from field" and I can't figure out how to achieve it.
thanks
You need to copy the rows into memory and then loop it across the excel file to generate multiple files. You need to break your solution to 2 parts. First of all, read all the rows from Excel Input step into "Copy rows to Result" step as a variable. In the next transformation, use the same variable to use it as a file parameter.
Please check the two links:
SO Similar Question: Pentaho : How to split single Excel file to multiple excel sheet output
Blog : https://anotherreeshu.wordpress.com/2014/12/23/using-copy-rows-to-result-in-pentaho-data-integration/
Hope this helps :)
The issue is that the step is mostly made for outputting the rows to a single file, not making a file for each row.
This isn't the most elegant solution but I do think it will work. From your transformation you can call a sub-transformation (Mapping) and send a variable to it containing the filename. The sub-transformation can simply do one thing: write the file, and it should work fine. Make sense?

Export different sized tables to same excel tab in SAS

I have two tables A and B that have a different number of columns, with absolutely no match between the columns names but one differentiator (let's call it ID).
I'm programming a macro in SAS, so that it outputs an excel file such as:
each sheet within the excel is corresponding to an ID.
within each sheet, I have:
content of table A
empty line
content of table B
The problem is that I can't append rows of data in SAS because columns are non matching.
Any thoughts?
Thanks for your help!
You can use DDE for that - Dynamic Data Exchange Protocol. Basically what it does is simulating user's commands and clicks on various menus and buttons in Excel (and also in Word and some other applications) - or, more exactly, issues commands on now obsolete Macro Language for Excel ver. 4 (X4ML).
So, using DDE, you can in SAS program launch Excel, open or create workbook, create tabs (spreadsheets), put your data into specified cells range, format any single cell or range etc.
Here's a good intro into this topic:
http://www2.sas.com/proceedings/sugi26/p011-26.pdf

Skipping rows when importing Excel into SQL using SSIS 2008

I need to import sheets which look like the following:
March Orders
***Empty Row
Week Order # Date Cust #
3.1 271356 3/3/10 010572
3.1 280353 3/5/10 022114
3.1 290822 3/5/10 010275
3.1 291436 3/2/10 010155
3.1 291627 3/5/10 011840
The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.
I Googled the problem, but came up empty.
have a look:
the links have more details, but I've included some text from the pages (just in case the links go dead)
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb
Q:
While we are loading the text file to SQL Server via SSIS, we have the
provision to skip any number of leading rows from the source and load
the data to SQL server. Is there any provision to do the same for
Excel file.
The source Excel file for me has some description in the leading 5
rows, I want to skip it and start the data load from the row 6. Please
provide your thoughts on this.
A:
Easiest would be to give each row a number (a bit like an identity in
SQL Server) and then use a conditional split to filter out everything
where the number <=5
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217
Q:
Is it possible during import data from Excel to DB table skip first 6 rows for example?
Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?
A:
YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In
your Data Flow task, you will need to set the "OpenRowset" Custom
Property of your Excel Connection (right-click your Excel connection >
Properties; in the Properties window, look for OpenRowset under Custom
Properties). To ignore the first 5 rows in Sheet1, and import columns
A-M, you would enter the following value for OpenRowset: Sheet1$A6:M
(notice, I did not specify a row number for column M. You can enter a
row number if you like, but in my case the number of rows can vary
from one iteration to the next)
AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in
each row that uniquely identifies it as a header row; skip the rows
that match this 'header logic'. Another option would be to import all
the rows and then remove the header rows using a SQL script in the
database...like a cursor that deletes every 12th row. Or you could
add an identity field with seed/increment of 1/1 and then delete all
rows with row numbers that divide perfectly by 12. Something like
that...
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2
Q:
I have an SSIS package that imports from an Excel file with data
beginning in the 7th row.
Unlike the same operation with a csv file ('Header Rows to Skip' in
Connection Manager Editor), I can't seem to find a way to ignore the
first 6 rows of an Excel file connection.
I'm guessing the answer might be in one of the Data Flow
Transformation objects, but I'm not very familiar with them.
A:
Question Sign in to vote 1 Sign in to vote rbhro, actually there were
2 fields in the upper 5 rows that had some data that I think prevented
the importer from ignoring those rows completely.
Anyway, I did find a solution to my problem.
In my Excel source object, I used 'SQL Command' as the 'Data Access
Mode' (it's drop down when you double-click the Excel Source object).
From there I was able to build a query ('Build Query' button) that
only grabbed records I needed. Something like this: SELECT F4,
F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4
<> 'TheHeaderFieldName')
Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but
that wasn't supported for some reason.
In my particular case, I was only interested in rows where F4 wasn't
NULL (and fortunately F4 didn't containing any junk in the first 5
rows). I could skip the whole header row (row 6) with the 2nd WHERE
clause.
So that cleaned up my data source perfectly. All I needed to do now
was add a Data Conversion object in between the source and destination
(everything needed to be converted from unicode in the spreadsheet),
and it worked.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.
If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
Not entirely correct.
SSIS forces you to use the format and quite often it does not work correctly with excel
If you can't change he format consider using our Advanced ETL Processor.
You can skip rows or fields and you can validate the data the way you want.
http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html
Sky is the limit
You can just use the OpenRowset property you can find in the Excel Source properties.
Take a look here for details:
SSIS: Read and Export Excel data from nth Row
Regards.

Resources