Rapidminer - Spliting rows that has values in wrong type - attributes

I had a data set of 8 millon rows in a txt file with tab delimited format without quotes.
I had 5 of the 14 columns with date values in dd.MM.yyyy format.
Problem 1
I am trying to import the file. In "Format your colums" step, if I choose the type of that colums as "date", it gives errors and all cells in columns turns "?"
So I selected "polynomial" and planed to convert attribute type to date later.
Problem 2 (the real one)
I imported the data and put "nominal to date" operator. When I run I got error in line 14.899:
Cannot parse date: Unparseable date: "0"
I find the line and I see that columns separated wrong. There was a tab character in a string in the a prior cell. So values moved one cell right. And this row was not the only one that moved.
I want to split the rows that has the values in wrong data type for spesified attributes. So I cant correct them manually.
How can I do that in Rapidminer?
Or any other ideas to figure theese problems out?

so most likely you need to adjust the date formatting in this pull-down menu:
To be honest, I usually just import as polynominal and then convert to date in my process. It's easier and reproducable.

You appear to have a broken input file.
The best solution, obviously, is to fix the process that generates the data. Espace or replace tab characters and format the date in a non-ambiguous format such as the ISO date format.
Assuming that you can't fix the date, you should probably write a robust parser program yourself. A generic parser such as rapidminer's won't be able to fix every problem.

Related

Excel converts imported from csv numbers to text

when I import the data from csv, I cannot work on it because the excel treats the numbers as a text. When I try to sum them or get the average I get 0 or error becouse there are none number. It changes when i delete the dot '.' in one cell and put i again. That operation changes type of variable to number and it works. But I don't want to change tousends of data in this way. How can I convert it somehow to make i work?
Thanks for every answer.
Try to use general options selected properly dont import with text format select general format as given in picture.

Type conversion failure in Access 2013

When importing data from a text file (csv) into MS Access, I get an error "Type conversion failure" for 1 field. The field has data with date format "yyyy-mm-dd hh:nn:ss" and Access simply refuses to recognise it and places #Num! or simply blank data. The csv file is huge with 8m rows and cannot be opened in Excel to edit the date format. Facing no problems with any other fields.Anyway to avoid this error?
Use the Advanced... button at the field specification step of the import and try these settings:
I don't have the exact date format in the picture above, but it is just to show how to import that specific date.
Date Order should be YMD because in your dates, you have the years coming first, followed by the month and the date.
The date delimiter for your csv will be a dash -, while the time delimiter should be the default colon :. Make sure the 4 digit years checkbox is checked, and I would also check the Leading Zeros in Dates checkbox since your month and dates are in mm and dd formats respectively (i.e. they will begin with 0 if it is a single digit).
If there are problematic dates from your csv now, then this is another problem that won't be easy to tackle. You will maybe have to correct the date manually from the csv before importing it, or import the date as text and then create a new column to manipulate the text dates to date fields (and fix any problematic dates there).
Nothing wrong with the date format, but some records may be empty or have invalid entries.
Or you miss at the import to specify the separators and format for the date field.
If still no luck, link the file and specify text for the field. Then create a select query that uses the linked file as source and use CDate to convert the text date to true date values.
When done, change the query to an append or create table query to import your data.

Format multiple date entries as strings

I have an Excel file storing a thousand lines of dates. Each date seems to be (auto)formatted as a Date. A (PHP Excel) parser I'm using (really can't update/use another one) is parsing this to a string which will occur in the number of days till 1900.
Is there a way to format the values in Excel being simple text "08.03.1991" to get this file parsed correctly?
I could add a quote: "'08.03.1991" but I need an (Excel-based) one-action-solution for all the thousand lines.
Remark: Since this is a file of a user I can't just write simple VBA-Script or so to handle this since there will be new files in the future and the User needs to be able to solve this alone.
I admit I am not quite sure what you have and what you want but it may be worth trying: Select column of dates, apply Text to Columns with Tab as delimiter and in step 3 of 3 select Text.
You could use the TEXT function like this:
=TEXT(A1,"dd.mm.yyyy")
For more details have a look here

FIxing MS Excel date time format

A reporting service generates a csv file and certain columns (oddly enough) have mixed date/time format , some rows contain datetime expressed as m/d/y, others as d.m.y
When applying =TYPE() it will either return 1 or 2 (Excel will recognize either a text or a number (the Excel timestamp))
How can I convert any kind of wrong date-time format into a "normal" format that can be used and ensure some consistency of data?
I am thinking of 2 solutions at this moment :
i should somehow process the odd data with existing excel functions
i should ask the report to be generated correctly from the very beginning and avoid this hassle in the first place
Thanks
Certainly your second option is the way to go in the medium-to-long term. But if you need a solution now, and if you have access to a text editor that supports Perl-compatible regular expressions (like Notepad++, UltraEdit, EditPad Pro etc.), you can use the following regex:
(^|,)([0-9]+)/([0-9]+)/([0-9]+)(?=,|$)
to search for all dates in the format m/d/y, surrounded by commas (or at the start/end of the line).
Replace that with
\1\3.\2.\4
and you'll get the dates in the format d.m.y.
If you can't get the data changed then you may have to resort to another column that translates the dates: (assumes date you want to change is in A1)
=IF(ISERR(DATEVALUE(A1)),DATE(VALUE(RIGHT(A1,LEN(A1)-FIND(".",A1,4))),VALUE(MID(A1,FIND(".",A1)+1,2)),VALUE(LEFT(A1,FIND(".",A1)-1))),DATEVALUE(A1))
it tests to see if it can read the text as a date, if it fails, then it will chop up the string, and convert it to a date, else it will attempt to read the date directly. Either way, it should convert it to a date you can use

Prevent comma-separated list of numbers being interpreted as single large value

33266500,332665100,332665200,332665300 was the original value, cell should look like this: 33266500,332665100,332665200,332665300 but what I see as the cell value in excel is 3.32665E+34
So the question is I want to convert it into the original string. I have found format function on google and I used it like these
format(3.32665E+34,"standard")
giving it as 332,6650,033,266,510,000,000,000
How to parse it or get back the orginal string? I belive format is the function in vba.
Excel has a 15 digit precision limit. If the numbers are already shown like this when you access the file, there is no way to get the number back - you have already lost some digits. VBA code and formulas will not help you.
If this is not the case, you can add a single quote ' mark before the number to store it as text. This will ensure Excel does not try to treat it as a number and thus lose precision.
If you want the value kept exactly, store the data as a string, not as a number. The data type you are using simply doesn't have the ability to do what you are asking it to do.
If you're starting with an Excel file that has already been created then you've already lost the information: Excel has tried to understand what it was given and its best guess has turned out to be wrong. All you can do (if you can't get the source data) is go back to the creator of the Excel file and tell them what's wrong.
If you're starting with, say, a text file that you're importing, then the news is much better:
If you're importing manually using the Text Import Wizard, then at "Step 3 of 3" you need to set "Column Data Format" for the problem field to "Text".
If you're using a macro, you'll need to specify a value for the TextFileColumnDataTypes property that does the same thing. The easiest way to get it right is to use the Macro Recorder.
If you want the four values in the string to be separate cells, then again, look at the Text Import Wizard settings: in Step 1 of 3 you need to set "Delimited" data type (usually the default) and in Step 2 make sure that "Comma" is checked.
The value needs to be entered into the cell as a string. You need to make whatever it is that inserts the value preceed the value with a '.

Resources