CSV - need comma at the start when first field is not present - csv generated from excel - excel

I am using excel to generate comma separated values. I have a parser to parse the csv data and insert to database.
The issue I face is, the first field in my csv is not mandatory. When the first field is null, the generated csv has no comma BEFORE the second field and for the parser the second field becomes the first field.
When the first field is null, I expect the data to be like below.
,SECOND_FIELD, THIRD_FIELD
I have tried
putting a space in the first field. In this case I wil have to change my parser.
Putting a static header. Then the comma is coming as expected in the underlying rows when first field is null. Change in parser will be required.
Putting a comma in the first field, but this is put as ",". :-)
Can someone through some solutions or workarounds ?
Thanks

Quick workaround: Why don't you check how many values are present? If one is missing, asume it's the first one.
EDIT:
I've found this question that may help you. In a nutshell: apply any formatting to the range of cells you are using so Excel doesn't skip any of them when exporting. Also, I think that if you can swap the first column (optional) with any other one (required), it will work, too.

Related

Remove Leading Spaces in Excel Address List

I have numerous files where the address field is in a single line of text, for the most part separated by a comma. My first step is using 'Replace' function in Excel to replace comma's with a carriage return. This is to turn an address from a single line into multiple lines.
The issue I'm looking to get assistance with, is when I complete the steps above, a leading space is often remaining in all rows from the second row onwards. I would like to know the best way to remove the leading spaces in these rows and keep the format of multi-line addresses.
I have tried using TRIM however these returns the address back to a single line
To show the pre and post transformed data I've added an image below as I can't seem to get the format to show correctly here on this post. Due to my profile being new I also can't imbed the image so there is a link below showing the pre and post transformed data, and the leading space issue I'm seeking help with
Pre and Post Example
Thanks,
Steve
As #Anonymous mention in comment, replace both comma and space at a time by SUBSTITUTE() formula and use WRAP TEXT format of resulting cell.
=SUBSTITUTE(A2,", ",CHAR(10))

Pentaho Kettle - Loading excel with almost blank rows

I got an excel file from a uncontrolled source that comes with a row with all the fields filled and then several rows all fields blank except one (Always the same, is a commentary).
The commentaries belong to the ID of the "row with data".
I would like to make a new field "COMENTARY AGREGATED" with the concatenation of all the comenataries that belong to the ID but I don't know how to do it, as far as I know, you can't interact with the order of the rows as they are treated as independent. ¿Am I right and this is imposible to do inside kettle and should resort to a VB macro in excel as preprocess?
THanks for your time
You can use a group by step, group by all fields except the comment one, and on aggregations choose “concatenate values separated by” and use a whitespace as value for the concatenation ( or nothing if you prefer).
The excel input can’t do all that on its own.
for now I've advanced a little.
I found that in the Excel input step, in the Fields tab, the Repeat column can be set to Y, and if so, it fills the blank rows with the previous value.
Still don't know how to agregate the others but its a step in the right direction I guess.

Rapidminer - Spliting rows that has values in wrong type

I had a data set of 8 millon rows in a txt file with tab delimited format without quotes.
I had 5 of the 14 columns with date values in dd.MM.yyyy format.
Problem 1
I am trying to import the file. In "Format your colums" step, if I choose the type of that colums as "date", it gives errors and all cells in columns turns "?"
So I selected "polynomial" and planed to convert attribute type to date later.
Problem 2 (the real one)
I imported the data and put "nominal to date" operator. When I run I got error in line 14.899:
Cannot parse date: Unparseable date: "0"
I find the line and I see that columns separated wrong. There was a tab character in a string in the a prior cell. So values moved one cell right. And this row was not the only one that moved.
I want to split the rows that has the values in wrong data type for spesified attributes. So I cant correct them manually.
How can I do that in Rapidminer?
Or any other ideas to figure theese problems out?
so most likely you need to adjust the date formatting in this pull-down menu:
To be honest, I usually just import as polynominal and then convert to date in my process. It's easier and reproducable.
You appear to have a broken input file.
The best solution, obviously, is to fix the process that generates the data. Espace or replace tab characters and format the date in a non-ambiguous format such as the ISO date format.
Assuming that you can't fix the date, you should probably write a robust parser program yourself. A generic parser such as rapidminer's won't be able to fix every problem.

How do I import data from a .xlsx file to Filemaker Pro if the "matching" field has trailing zeros?

I am importing data from an Excel file into a Filemaker Pro database (FMP 12.0 v5 for Mac). I am using the imported data to "Update matching records in found set". However, the field that I am using to match occasionally contains trailing zeros.
When importing, FMP does not match the fields correctly, because it ignores the trailing zeros.
To explain further: the field in the database is a calculated text field, "courseID.personID", determined by concatenating the numerical "courseID" and "personID" fields (with a dot in between them). The field in my Excel file is formed similarly, using Excel formulae. Some "personID" values end in a zero, e.g. 120, and so courseID.personID becomes something like "123.120". I am matching the Excel field to the FMP field.
I first noticed this was happening, and was very careful to go back to Excel and make a new file (to start fresh), select all cells and set format to Text. Then, I did a Paste Special from my original data, and selected Paste as Values. All the cells in the courseID.personID column gave a "number stored as text error", with the option to convert the text to numbers. I selected the option to ignore the error, to leave all the data stored as text, with the intention of preserving the trailing zeros.
Alas, the issue persists. So, does anyone have any ideas of how to force Excel to format and communicate the proper values? Or, is it an issue of making FMP interpret the data properly, maybe by adjusting field types?
the field in the database is a calculated text field,
"courseID.personID", determined by concatenating the numerical
"courseID" and "personID" fields (with a dot in between them). The
field in my Excel file is formed similarly, using Excel formulae.
Come to think of it, the simplest solution would be to eliminate the calculation fields and use the original values for the import:

Prevent comma-separated list of numbers being interpreted as single large value

33266500,332665100,332665200,332665300 was the original value, cell should look like this: 33266500,332665100,332665200,332665300 but what I see as the cell value in excel is 3.32665E+34
So the question is I want to convert it into the original string. I have found format function on google and I used it like these
format(3.32665E+34,"standard")
giving it as 332,6650,033,266,510,000,000,000
How to parse it or get back the orginal string? I belive format is the function in vba.
Excel has a 15 digit precision limit. If the numbers are already shown like this when you access the file, there is no way to get the number back - you have already lost some digits. VBA code and formulas will not help you.
If this is not the case, you can add a single quote ' mark before the number to store it as text. This will ensure Excel does not try to treat it as a number and thus lose precision.
If you want the value kept exactly, store the data as a string, not as a number. The data type you are using simply doesn't have the ability to do what you are asking it to do.
If you're starting with an Excel file that has already been created then you've already lost the information: Excel has tried to understand what it was given and its best guess has turned out to be wrong. All you can do (if you can't get the source data) is go back to the creator of the Excel file and tell them what's wrong.
If you're starting with, say, a text file that you're importing, then the news is much better:
If you're importing manually using the Text Import Wizard, then at "Step 3 of 3" you need to set "Column Data Format" for the problem field to "Text".
If you're using a macro, you'll need to specify a value for the TextFileColumnDataTypes property that does the same thing. The easiest way to get it right is to use the Macro Recorder.
If you want the four values in the string to be separate cells, then again, look at the Text Import Wizard settings: in Step 1 of 3 you need to set "Delimited" data type (usually the default) and in Step 2 make sure that "Comma" is checked.
The value needs to be entered into the cell as a string. You need to make whatever it is that inserts the value preceed the value with a '.

Resources