My data are in Excel, so to convert them in Libsvm format, I convert the Excel sheet to CSV format and follow the procedure on Libsvm web site:- assuming the CSV file is SPECTF.train : -
matlab> SPECTF = csvread('SPECTF.train'); % read a csv file
matlab> labels = SPECTF(:, 1); % labels from the 1st column
matlab> features = SPECTF(:, 2:end);
matlab> features_sparse = sparse(features); % features must be in a
sparse matrix
matlab> libsvmwrite('SPECTFlibsvm.train', labels, features_sparse);
Then I read it using libsvmread (name)
Is there a shorter way to format excel data in Libsvm format directly? Thanks.
I don't think there is a need to convert to csv. You can use xlsread to read the data directly from the excel file and use libsvmwrite to get that in the form compatible with libsvm.
Related
% Exercise for a PD controller
clear;
clc;
close all;
%% Step 1: read in sxcel sheet
filename = 'WineQuality';
data = readtable(filename,'PreserveVariableNames',true);
inputs = [data.type'; data.fixed_acidity'; data.citric_acid'; data.residual_sugar';
data.chlorides';
data.free_sulfur_dioxide'; data.total_sulfur_dioxide'; data.density'; data.pH';
data.sulphates'; data.alcohol'];
targets = data.Quality';
I cant get this to format correctly as I keep getting the error when I try to load in the excel sheet into matlab:
Error using vertcat
Inconsistent concatenation dimensions because a 1-by-3255 'double' array was converted to a 1-by-1 'cell'
array. Consider creating arrays of the same type before concatenating.
Error in Assignment2_s221395946 (line 11)
inputs = [data.type'; data.fixed_acidity'; data.citric_acid'; data.residual_sugar'; data.chlorides'; data.
once I've dont that I need to put it into a feedforward network in order to train it for the quality. Any help will be greatly appreciated.
I intend to make a model using sklearn to predict cuisines. I however have this column in my data (Column B) that brings me a ValueError: could not convert string to float: 'indian'
please help if you can.
csv file
You are probably trying to cast that column to a float somewhere in your code. If you're using sklearn, it will handle converting the label column specified into a numeric label representation. If you want to specify the string label name to integer label you can do it like this:
label_mapper = dict(zip(set(df['Column B']), len(set(df['Column B'])))
df['Column B'] = df['Column B'].apply(lambda x: label_mapper[x])
I am trying to access data from a CSV using python. I am able to access entire columns for data values; however, I want to also access rows, an use like and indexed coordinate system (0,1) being column 0, row 1. So far I have this:
#Lukas Robin
#25.07.2021
import csv
with open("sun_data.csv") as sun_data:
sunData = csv.reader(sun_data, delimiter=',')
global data
for data in sunData:
print(data)
I don't normally use data tables or CSV, so this is a new area for me.
As mentioned in the comment, you could make the jump to using pandas and spend a little time learning that. It would be a good investment of time if you plan to do much data analysis or work with data tables regularly.
If you just want to pull in a table of numbers and access it as you request, you are perfectly fine using csv package and doing that. Below is an example...
If your .csv file has a header in it, you can simply add in next(sun_data) before starting the inner loop to advance the iterator and let that data fall on the floor...
import csv
f_in = 'data_table.csv'
data = [] # a container to hold the results
with open(f_in, 'r') as source:
sun_data = csv.reader(source, delimiter=',')
for row in sun_data:
# convert the read-in values to float data types (or ints or ...)
row = [float(t) for t in row]
# append it to the data table
data.append(row)
print(data[1][0])
I am writing a dataframe into a csv as follows:
appended_Dimension.to_csv("outputDimension.csv")
The dataframe is as follows:
Cameroun Rwanda Niger Zambia Mali Angola Ethiopia
ECON 0.056983 0.064422 0.047602 0.070119 0.048395 0.059233 0.085559
FOOD 0.058250 0.046348 0.048849 0.043527 0.049064 0.013157 0.081436
ENV 0.013906 0.004013 0.010519 0.001973 0.005360 0.023010 0.008469
HEA 0.041496 0.078403 0.040154 0.054466 0.029954 0.053007 0.061761
PERS 0.056687 0.021978 0.062655 0.056477 0.087056 0.089886 0.043747
The output is as follows:
I d like to write data in a float format so i can process it in csv directly. How can i do that please?
You cannot keep it as float inside the csv. The csv will treat everything as strings. You must load your data from the csv and perform the relevant operations then save it back. You cannot manipulate it while it is present inside the csv.
I have a sqlite3 database containing many text data. I want to extract the text and convert them into term-frequency matrix using 'CountVectorizer' or 'HashingVectorizer'. The way I can think of is to use 'fetchall' function of 'sqlite3..cursor'.
The problem is that the dataset is too big. I am wondering if there is a way to extract features and convert to matrix iteratively?
# extract text data using 'fetchall'
conn=sqlite3.connect('text.db')
c=conn.cursor()
c_exe=c.execute("SELECT * FROM table")
text_tuple=c_exe.fetchall()
text=[item[0] for item in text_tuple]
# convert the text into tf-matrix
vectorizer=CountVectorizer()
Y=vectorizer.fit_transform(text)
# if there's a way to do it iteratively, e.g. 'modified_vectorizer'
for text in c_exe:
Y=modified_vectorizer()