How to convert xls format to arff format? - excel

I have a (.xls) data file and I want to use it in Weka.
I try to convert it with Weka converter as described in Weka tutorial, but after saving as .csv type I don't know how to tag the attribute or data.
Is there any other way to convert .xls format to arff?

You must work wtih csv files, they can be directly converted in command line:
java -cp "path/to/weka.jar" weka.core.converters.CSVLoader file.csv > file.arff
Please make sure that there are no quotation mark " for numeric data.
If you use non standard characters, use UTF-8 and add -Dfile.encoding=utf-8 when you start weka.
Any nominal or string data will be automatically converted
If you still have problems, please provide part of your csv file.
What do you mean by "how tag the attribute or data." ?

The following steps can solve your problem
Save your .xls file in .csv format.
Open the Weka GUI Chooser and then click on the tools button in the top menu bar.
Click on the Arffviwer
Choose file types to be loaded like, *.csv, *.data
Open *.csv file to view the data and values
Name the file with the .arff extension
Save the file

Related

WEKA shows my first variable from the ARFF File starts with the characters 

In WEKA I tried to use an ARFF file as a test set for a model after classifying my training data (under the classify tab) but received the following error: "Train and test set are not compatible." See image:
]1
I opened both original training and test CSV files in Excel, and they looked the same to me. I opened the CSV files in Notepad++ and they looked the same to me. However, I opened the test ARFF in WEKA and found strange characters at the beginning of the the first attribute name. See image:
Why are the strange characters there and how do I remove them? I need to have the training and test ARFF files be compatible for classification.
Thank you in advance.
If you used Excel for editing the test file:
See if you had saved the file under the file format "CSV UTF-8 (Comma delimited) (*.csv)" as opposed to the file format "CSV (Comma delimited) (*.csv)". Note the latter file format does not include "UTF-8".
If so, re-save the test CSV file in Excel in the latter file format (without the UTF-8) and made a new ARFF file from it. This time, WEKA should not show the "" characters at the beginning of the first attribute.

export data with chinese to CSV

I am trying to export data containing chinese(some non-english characters for that matter) from ui-grid to pdf or CSV. However the exported text is all garbled. Here is the plnkr link
http://plnkr.co/edit/ZR34lhm3LUNmUrj7Vg23?p=preview
I understand for the pdf export to work I need to have the correct cmap for the font and characterset in use but why is CSV export not working? I have even tried exporterOlderExcelCompatibility: false but that didn't help either.
Did you try with:
exporterOlderExcelCompatibility: true
(false is the default).
I had a problem exporting Umlaut character to CSV, and I fixed it using exporterOlderExcelCompatibility: true in the gridOptions configuration.
The issue you are facing with the CSV is most likely due to opening this in Excel. I do not face the issue with OpenOffice.
For Excel 2013, these steps should work:
Launch Excel, and go to the Data tab.
Click the From Text button.
Navigate to the exported file, and click open.
Set the encoding to UTF-8 if it wasn't auto-selected.
Click Next.
Change the delimeter to comma.
Click Finish.
Choose where you want the input, or just click OK.
At this point, you should have the CSV displaying correctly.
With the CSV exporter, CSV-JS, I do not see any options to set the encoding before hand.
If you have small file: open your csv using Google sheet. and save it as csv. It will work.
if you have large file: try open office( free and easy to download). Open with Open Office excel and save as utf-8.
The default English Windows encoding for Chinese is GBK. For Ubuntu is utf8t. Make sure you have the irght encoding for your system.

Forwarding the results of text processing commands to certain locations in an .ods file

I am looking for an efficient way to import the data from a bunch of text files into an .ods file. I have no problem in processing the text files with commands like grep and sed, however, I do not know if it is possible to redirect the results of these commands into a certain location in an ods file.
The .ods file format is basically an xml file format. In the case of .fods it is straight xml. In the case of .ods it is zipped xml. So directly inserting content from text files will likely require some xml tools. I'm using Ubuntu and found xml2/2xml could be useful for converting between xml and xml-path-style text. (sudo apt-get install xml2)
So you will have to do the following:
unzip the .ods file - the cell data will be in a file called content.xml
xml2 < content.xml to get raw text out of the xml
Edit the raw text with your content
Convert the edited raw text back to xml using 2xml
Rezip up the previously unzipped .ods, including your edited content
This may be quite an involved/cumbersome process. Alternatively I'd suggest simply saving your .ods file as a .csv file instead and directly editing the comma-separated-values.

How to make file to be visualized in Excel

I've got a Groovy script that grabs the data I want out of a large text file. I want to make a tab-separated datafile with this script and then use the file for an Excel visualization. What is the best file type to create, and how do I create it in Groovy?
Thanks!
Why not just write it out as tab separated file as you say?
Or you could use Apache POI and HSSF with a groovy builder or something...

.dat file how to create one based on excel document

I have a .csv file in my matlab folder with 38 columns and about 48 thousand entries. I was hoping on using the findcluster gui but it only accepts .dat files.
How do I create a .dat file in matlab or specifically how do I convert the .csv file into a .dat file that can be used by the matlab fcm clustering tool?
example of csv:
how would I go about creating a data file for this kind of information?
The only documentation I could find about the file format was
The data set must have the extension .dat. For example, to load the data set,
clusterdemo.dat, type findcluster('clusterdemo.dat').
I checked clusterdemo.dat and found that the data is stored in ASCII format. Therefore, try
a = csvread('data.csv');
save 'data.dat' a -ASCII
Just rename xxx.csv to xxx.dat. This worked for me.
you should try changing extension.For changing extension you can go to folder settingand in view where we show hidden file…uncheck the hide extension for known files and now you can change the extension of any file by renaming it.
Because
There really isn't such a thing as 'dat' format, a 'dat' file is just a text file, it could theoretically have any extension you want.It could also be delimited however you want/need, it all really depends on what you are trying to achieve.
ie what are you going to use this file for?
If it's for use with another application then the requirements of that application will probably dictate how it's delimited/structured etc.
OR simply you can save the file from the excel as .csv and then later can change the extension.
It worked for me.

Resources