Excel Turn Chunks of Text Into Columns - excel

I need to find a programmatic way to do the following. Basically, I have a text file with the followoing:
Of course, there is real data in there and there are several thousand different "chunks" like this. First, we would like to collapse Job Title 1 and Job Title 2 into a single line. Then, we need to import this into excel in a row format. Such as:

I've tackled a similar problem at my job, and the data can actually be manipulated purely in Excel. Your first step is to just get it into Excel, open it in Notepad - copy, then paste into Excel - or optionally try to open it. The trick is to use the "blank row" as a trigger for a set or formulas that you just fill down. When it's done you copy all the results over to a new worksheet, and sort so all the blank rows get thrown away. You'll have to hack away at your data and merge the various results next - but that wont be too hard because you should get 3 different kinds of rows (those with 1 job, 2, 3) .. just fix them with a unique formula each. Hopefully this gives you an idea of how to get started:

Related

How to use a VBA to combine rows that are prematurely broken from a CSV?

I have a CSV showing purchase information, but in each line the data must have too many commas as a few extra breaks are added in. In this case, row 1 is the header. Rows 2-5 should be merged into one, and the blank row 6 shouldn't exist.
Here's what the data looks like in the "Get Data" feature in Excel before being input
I think the solution provided here is close to what I need to do, but as someone who's never written a Macro I was wondering if someone can provide a more specific solution.
https://superuser.com/questions/395126/how-to-combine-values-from-multiple-rows-into-a-single-row-in-excel

VB Excel - Print set columns on diff pages

I have been given a very ugly excel document which I need to be able to print the first 4 columns off every time and then loop through each of the next columns so they only print one at a time per sheet as if the first 4 columns were frozen and then each of the others prints off one at a time in turn.
I have tired a number of different approaches to this from hide, unhide to looping and active cells but I cant seem to find the best and most efficient effect way to do it.
The first four columns contain standard headings as titles and then the subsequent ones relate to a specific object - I want to be able to use VB to press a button and print out the heading columns and then effectively each object column on its own page so its like every object has its own page with heading to go into a ring binder file. Hopefully allowing each object to be quickly viewed independently without the data of the other objects if I was viewing the whole spread sheet. Effectively each object on its own page with the default headings, each object is a different column.
I am sure there is a simple way to do this and I am over complicating it!
Any help welcome.
Thanks

Why is the text in my excel spreadsheet created from csv treating everything as text?

I wrote a python script to generate some data into a csv file. The data looks something like the following:
12/10/2015 1 0:05:38 0:09:18 0:00:24 0:15:20
5/11/2016 1 0:39:07 3:22:09 0:00:08 4:01:24
7/27/2016 1 0:00:00 0:37:42 0:02:12 0:39:54
8/4/2016 1 0:00:00 0:00:29 0:00:35 0:01:04
10/3/2016 1 0:05:51 0:50:46 0:00:17 0:56:54
The data I am interested in analyzing is in the form of h:mm:ss but formuals that I write to sum the information doesn't work. I figured out that the ISTEXT(CELLNUM) is returning TRUE so it is clearly treating the data is text even if I manually reformat the cells as h:mm:ss. I must be overlooking something simple because there must be a way to do this easier without having to go through a process every time I open a CSV into excel and save it as a spreadsheet. How can I open this csv into excel and save as a spreadsheet in a way that I can setup formulas to sum the times? I might end up creating a lot of these CSV files so I need a way to do it that is fast. What am I missing? Why isn't simply selecting all of the cells and reformatting them working?
The best answer is posted here by jeeped
When you have pasted data from an external source (e.g. web pages are
horrific for this) into a worksheet and numbers, dates and/or times
come in as textual representations rather than true numbers, dates
and/or times usually the quickest method is to select the column and
choose Data ► Text to Columns ► Fixed Width ► Finish. This forces
Excel to reevaluate the text values and should revert the
pseudo-numbers into their true numerical values.
It's strange that excel can't figure this out or provide a way to do it as the data is imported. It can handle dates during import but not time. However the fact that I can so easily fix the time values one column at a time after saving as an xlsx file makes me wonder why Microsoft never bothered to just make it easier to specify what the columns are when bringing in the data the first time. Instead I have to search the internet for hours on end to ultimately find a solution that takes just a minute or two. Weird. There are some other answers posted for other types of data where you can use paste special to add a number to the existing data but those solutions do not seem to work for time.

Creating a new row when there is more than one column

How would you force create a new row in Excel wherever there is a an additional column (i.e. more than one) - as seen below?
I'm currently doing this manually and it is quite time consuming.
I recently had to do this exact thing, and came up with two solutions:
1) Export as CSV, run several search+replaces on the data with a text tool, re-import into Excel, then use a formula to create the duplicate user numbers for empty first rows.
2) Write 15 lines of PHP code to turn the CSV into a new CSV.
The second is MUCH more effective and flexible, but it requires a basic coding environment.

Skipping rows when importing Excel into SQL using SSIS 2008

I need to import sheets which look like the following:
March Orders
***Empty Row
Week Order # Date Cust #
3.1 271356 3/3/10 010572
3.1 280353 3/5/10 022114
3.1 290822 3/5/10 010275
3.1 291436 3/2/10 010155
3.1 291627 3/5/10 011840
The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.
I Googled the problem, but came up empty.
have a look:
the links have more details, but I've included some text from the pages (just in case the links go dead)
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb
Q:
While we are loading the text file to SQL Server via SSIS, we have the
provision to skip any number of leading rows from the source and load
the data to SQL server. Is there any provision to do the same for
Excel file.
The source Excel file for me has some description in the leading 5
rows, I want to skip it and start the data load from the row 6. Please
provide your thoughts on this.
A:
Easiest would be to give each row a number (a bit like an identity in
SQL Server) and then use a conditional split to filter out everything
where the number <=5
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217
Q:
Is it possible during import data from Excel to DB table skip first 6 rows for example?
Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?
A:
YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In
your Data Flow task, you will need to set the "OpenRowset" Custom
Property of your Excel Connection (right-click your Excel connection >
Properties; in the Properties window, look for OpenRowset under Custom
Properties). To ignore the first 5 rows in Sheet1, and import columns
A-M, you would enter the following value for OpenRowset: Sheet1$A6:M
(notice, I did not specify a row number for column M. You can enter a
row number if you like, but in my case the number of rows can vary
from one iteration to the next)
AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in
each row that uniquely identifies it as a header row; skip the rows
that match this 'header logic'. Another option would be to import all
the rows and then remove the header rows using a SQL script in the
database...like a cursor that deletes every 12th row. Or you could
add an identity field with seed/increment of 1/1 and then delete all
rows with row numbers that divide perfectly by 12. Something like
that...
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2
Q:
I have an SSIS package that imports from an Excel file with data
beginning in the 7th row.
Unlike the same operation with a csv file ('Header Rows to Skip' in
Connection Manager Editor), I can't seem to find a way to ignore the
first 6 rows of an Excel file connection.
I'm guessing the answer might be in one of the Data Flow
Transformation objects, but I'm not very familiar with them.
A:
Question Sign in to vote 1 Sign in to vote rbhro, actually there were
2 fields in the upper 5 rows that had some data that I think prevented
the importer from ignoring those rows completely.
Anyway, I did find a solution to my problem.
In my Excel source object, I used 'SQL Command' as the 'Data Access
Mode' (it's drop down when you double-click the Excel Source object).
From there I was able to build a query ('Build Query' button) that
only grabbed records I needed. Something like this: SELECT F4,
F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4
<> 'TheHeaderFieldName')
Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but
that wasn't supported for some reason.
In my particular case, I was only interested in rows where F4 wasn't
NULL (and fortunately F4 didn't containing any junk in the first 5
rows). I could skip the whole header row (row 6) with the 2nd WHERE
clause.
So that cleaned up my data source perfectly. All I needed to do now
was add a Data Conversion object in between the source and destination
(everything needed to be converted from unicode in the spreadsheet),
and it worked.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.
If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
Not entirely correct.
SSIS forces you to use the format and quite often it does not work correctly with excel
If you can't change he format consider using our Advanced ETL Processor.
You can skip rows or fields and you can validate the data the way you want.
http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html
Sky is the limit
You can just use the OpenRowset property you can find in the Excel Source properties.
Take a look here for details:
SSIS: Read and Export Excel data from nth Row
Regards.

Resources