ssconvert - How to specify a specific field format - logstash

I'm looking for the way to specify a field format when converting an .xlsx file to csv using ssconvert tool on linux.
I want this information because actually, I have a field that is in floating point type but after conversion, it comes as a string.
For exemple: 99,8923 become "99,8923" but other values in the same field are correctly parsed, like 54 is 54 and that's normal.
The probleme is with " added around the number. While parsing this number with Logstash, it becomes 998923 even if I specify in logstash that it is a float.
Thanks for your help.

Related

Rapidminer - Spliting rows that has values in wrong type

I had a data set of 8 millon rows in a txt file with tab delimited format without quotes.
I had 5 of the 14 columns with date values in dd.MM.yyyy format.
Problem 1
I am trying to import the file. In "Format your colums" step, if I choose the type of that colums as "date", it gives errors and all cells in columns turns "?"
So I selected "polynomial" and planed to convert attribute type to date later.
Problem 2 (the real one)
I imported the data and put "nominal to date" operator. When I run I got error in line 14.899:
Cannot parse date: Unparseable date: "0"
I find the line and I see that columns separated wrong. There was a tab character in a string in the a prior cell. So values moved one cell right. And this row was not the only one that moved.
I want to split the rows that has the values in wrong data type for spesified attributes. So I cant correct them manually.
How can I do that in Rapidminer?
Or any other ideas to figure theese problems out?
so most likely you need to adjust the date formatting in this pull-down menu:
To be honest, I usually just import as polynominal and then convert to date in my process. It's easier and reproducable.
You appear to have a broken input file.
The best solution, obviously, is to fix the process that generates the data. Espace or replace tab characters and format the date in a non-ambiguous format such as the ISO date format.
Assuming that you can't fix the date, you should probably write a robust parser program yourself. A generic parser such as rapidminer's won't be able to fix every problem.

Retrieve Format of a NotesViewColumn

Is there a way to find out which format applies to a NotesViewColumn? I can see that there are a number of format attributes e.g. DateFmt, TimeDateFmt, NumberFormat etc, but what I can't see is a way to identify which one of them applies.
Combine your current column's value type with NotesViewColumn's format attributes.
If your current column value is of type number then use
NotesViewColumn's NumberFormat, NumberAttrib, NumberDigits format.
If your current column value is of type NotesDateTime then use
NotesViewColumn's DateFmt, TimeDateFmt, TimeFmt, TimeZoneFmt format.
Assuming you are able to read NSF design, you could rely on DXL export to get the information needed. Try the DXL utility on Tools - DXL Utilities - Viewer:
As you may find, the piece of information you need is nested inside every <column> node.
For number columns there is a <numberformat> node.
For time columns there is a <datetimeformat> node above <numberformat>.
For text columns there is nothing.
In case you need to programaticaly analyse a number of views, there is the NotesDXLExporter on hand.

Converting xsxl into csv

I'm wondering if there is a way to convert an .xsxl file into .csv while preserving everything in its entirety.
I have a column that for some rows has values like 0738794E5 and when I convert it through "save as", the value turns to 7.39E+10. I understand that some values which have an "E" will be turned to the latter format but this conversion is no use to me since that "E" doesn't stand for exponentiation.
Is there a setting to preserve the values the way they are i.e. text/string?
One option is to create an additional (or replacement) column that has the target values either enclosed in double quotes or prepended by an alpha character.
The quotes or alpha character will guarantee that the problem values come in as text. When the csv file is opened, the quotes or alpha will still be there, so you would need to use a string operation (MID or RIGHT, probably) to recover the original string values.
My dilemma wasn't real and only appeared to be so.
When I convert the .xlsx into .csv and open the .csv, it shows the improperly-converted values.
However, when I run my application, read from the csv, and output what's been read, I get the values contained within the .xlsx just like I wanted.
I'm not sure how/why this is the way it is but it works now.

Sybase: get a specific string from a binary column

on Sybase, I have a table containing a binary column.
Using convert(varchar(16384), convert(binary(16384), T1.TEXT)) as Text I can convert the data contained in to a string format.
Now there is my question: I need to select a string from this field as a new string containing specific words. How can I do it?
Let me take an example.
If I Suppose in one row the field contains the string "Output of this activity are txt files: the file orange.txt, the file black.txt and eventually the file red.txt", in output of my query I want the field as "orange.txt, black.txt, red.txt".
Is it possible to do it?
Thanks
You can't do this. This is because neither the BINARY nor the TEXT datatypes under Sybase allow sub-string searching or regular expression processing.
When you are storing character data, VARCHAR or UNIVARCHAR are always the better options. TEXT as a type should only ever be used if your TEXT fields are larger than your Sybase configured page size.

FIxing MS Excel date time format

A reporting service generates a csv file and certain columns (oddly enough) have mixed date/time format , some rows contain datetime expressed as m/d/y, others as d.m.y
When applying =TYPE() it will either return 1 or 2 (Excel will recognize either a text or a number (the Excel timestamp))
How can I convert any kind of wrong date-time format into a "normal" format that can be used and ensure some consistency of data?
I am thinking of 2 solutions at this moment :
i should somehow process the odd data with existing excel functions
i should ask the report to be generated correctly from the very beginning and avoid this hassle in the first place
Thanks
Certainly your second option is the way to go in the medium-to-long term. But if you need a solution now, and if you have access to a text editor that supports Perl-compatible regular expressions (like Notepad++, UltraEdit, EditPad Pro etc.), you can use the following regex:
(^|,)([0-9]+)/([0-9]+)/([0-9]+)(?=,|$)
to search for all dates in the format m/d/y, surrounded by commas (or at the start/end of the line).
Replace that with
\1\3.\2.\4
and you'll get the dates in the format d.m.y.
If you can't get the data changed then you may have to resort to another column that translates the dates: (assumes date you want to change is in A1)
=IF(ISERR(DATEVALUE(A1)),DATE(VALUE(RIGHT(A1,LEN(A1)-FIND(".",A1,4))),VALUE(MID(A1,FIND(".",A1)+1,2)),VALUE(LEFT(A1,FIND(".",A1)-1))),DATEVALUE(A1))
it tests to see if it can read the text as a date, if it fails, then it will chop up the string, and convert it to a date, else it will attempt to read the date directly. Either way, it should convert it to a date you can use

Resources