Convert text columns csv file to LibSVM or SVMLight format by the vectorization - svm

The CSV with text columns (sentence features in column) to convert it to svmlight or libsvm format (numerical format) by the vectorization like bag of words, etc?

You can use the python csv2libsvm.py file to convert. This issue is the same Converting CSV file to LIBSVM compatible data file using python

Related

Why are my dictionary datetime values writing into the csv file differently than how they print?

As my title states, I am trying to take the values from my dictionaries and write them into a .csv file. They are datetime files stored in ISO format YYYY-MM-DD. However, in the file, after the first 5 or so lines they go from YYYY-MM-DD format to MM/DD/YYYY format. For reference, this data in which these dictionaries gathered from have inconsistent forms (but that is handled). When I simply print the values they all print in YYYY-MM-DD format.
Here is a generic version of my code:
file = open(sys.argv[2], 'w')
for key in date_dict :
print(date_dict[key])
file.write('{}\n'.format(date_dict[key])
How do I fix this so it is all in YYYY-MM-DD format?

Is the following data in ASCII format? If not how would I use python3 to convert it into ASCII characters

So I have the following data and I want to use matplotlib to plot it, but I need to make sure my data is in ASCII format first. If it is great, but if not how would I convert it into ASCII format using python 3?
2019-05-19 13:15:15 CDT 3.4000000000e+01

datetime instead of str in read_excell with pandas

I have a dataset saved in an xls file.
In this dataset there are 4 columns that represent dates, in the format dd/mm/yyyy.
My problem is that when I read it in python using pandas and the function read_excel all the columns are read as string, except one, read as datetime64[ns], also if I specify dtypes={column=str}. Why?
Dates in Excel are frequently stored as numbers, which allows you to do things like subtract them, even though they might be displayed as human-readable dates like dd/mm/yyyy. Pandas is handily taking those numbers and interpreting them as dates, which lets you deal with them more flexibly.
To turn them into strings, you can use the converters argument of pd.read_excel like so:
df = pd.read_excel(filename, converters={'name_of_date_column': lambda dt: dt.strftime('%d/%m/%Y')})
The strftime method lets you format dates however you like. Specifying a converter for your column lets you apply the function to the data as you read it in.

how to read text from excel file in python pandas?

I am working on a excel file with large text data. 2 columns have lot of text data. Like descriptions, job duties.
When i import my file in python df=pd.read_excel("form1.xlsx"). It shows the columns with text data as NaN.
How do I import all the text in the columns ?
I want to do analysis on job title , description and job duties. Descriptions and Job Title are long text. I have over 150 rows.
Try converting the file from .xlsx to .CSV
I had the same problem with text columns so i tried converting to CSV (Comma Delimited) and it worked. Not very helpful, but worth a try.
You can pass a dictionary of column names and datatypes to read_excel with the dtype keyword:
col_dict = {‘a’: str, ‘b’: int}
pd.read_excel("form1.xls", dtype=col_dict)

CSV File format

I have created an csv file with the following values
10.25,10.35,10.45
One of the example is in german language decimal seperators are , and comma seperated values are ;
Apart from this is there any other difference in the CSV format.

Resources