How to load Only column names from csv file (Pandas)? - python-3.x

I have a large csv file and don't want to load it fully into my memory, I need to get only column names from this csv file. How to load it clearly?

try this:
pd.read_csv(file_name, nrows=1).columns.tolist()

If you pass nrows=0 to read_csv then it will only load the column row:
In[8]:
import pandas as pd
import io
t="""a,b,c,d
0,1,2,3"""
pd.read_csv(io.StringIO(t), nrows=0)
Out[8]:
Empty DataFrame
Columns: [a, b, c, d]
Index: []
After which accessing attribute .columns will give you the columns:
In[10]:
pd.read_csv(io.StringIO(t), nrows=0).columns
Out[10]: Index(['a', 'b', 'c', 'd'], dtype='object')

Related

Reading CSV column values and append to List in Python

I'd like to read a column from a CSV file and store those values in a list
The CSV file is currently as below
Names
Tom
Ryan
John
The result that I'm looking for is
['Tom', 'Ryan', 'John']
Below is the code that I've written.
import csv
import pandas as pd
import time
# Declarations
UserNames = []
# Open a csv file using pandas
data_frame = pd.read_csv("analysts.csv", header=1, index_col=False)
names = data_frame.to_string(index=False)
# print(names)
# Iteration
for name in names:
UserNames.append(name)
print(UserNames)
So far the result is as follows
['T', 'o', 'm', ' ', '\n', 'R', 'y', 'a', 'n', '\n', 'J', 'o', 'h', 'n']
Any help would be appreciated.
Thanks in advance
Hi instead of using converting your Dataframe to a String you could just convert it to a list like this:
import pandas as pd
import csv
import time
df = pd.read_csv("analyst.csv", header=0)
names = df["Name"].to_list()
print(names)
Output: ['tom', 'tim', 'bob']
Csv File:
Name,
tom,
tim,
bob,
I was not sure how your csv really looked like so you could have to adjust the arguments of the read_csv function.

Most efficient way to convert Python multidimensional list to CSV file?

I want to output a multidimensional list to a CSV file.
Currently, I am creating a new DataFrame object and converting that to CSV. I am aware of the csv module, but I can't seem to figure out how to do that without manual input. The populate method allows the user to choose how many rows and columns they want. Basically, the data variable will usually be of form [[x1, y1, z1], [x2, y2, z2], ...]. Any help is appreciated.
FROM populator IMPORT populate
FROM pandas IMPORT DataFrame
data = populate()
df = DataFrame(data)
df.to_csv('output.csv')
CSVs are nothing but comma separated strings for each column and new-line separated for each row, which you can do like so:
data = [[1, 2, 4], ['A', 'AB', 2], ['P', 23, 4]]
data_string = '\n'.join([', '.join(map(str, row)) for row in data])
with open('data.csv', 'wb') as f:
f.write(data_string.encode())

Automatical conversion of data from float to int while saving dataframe as csv file using to_csv in python

I am trying to save my dataframe as csv file which has two columns ID and Target. I want the target column to be float datatype i.e 1.0, 0.0 instead of 1 or 0. And I have created a dataframe of these values and my Target column is of floating type. But when I save this dataframe to csv file using to_scv() the "Target" column in the csv file is automatically converted to int datatype i.e. 1 and 0. I did the following
>>> submission["Target"] = submission["Target"].astype(float)
>>> submission.to_csv("test.csv", index=False)
Output csv file:
The expected output of Target column in csv file was supposed to be 1.0 or 0.0 but the actual output is 1 or 0
pd.to_csv outputs the correct type. My assumption is the program you use to display the CSV file auto-formats the cell content and thus does not display the .0s.
>>> import pandas as pd
>>> df = pd.DataFrame([{'id': 0, 'Target': 0.0}, {'id': 1, 'Target': 1.0}])
>>> fname = 'out.csv'
>>> df.to_csv(fname, index=False)
>>> with open(fname) as fh:
... columns = fh.readline().strip().split(',')
... rows = [dict(zip(columns, row.strip().split(','))) for row in fh.readlines()]
>>> print(rows)
[{'Target': '0.0', 'id': '0'}, {'Target': '1.0', 'id': '1'}]

Pandas puts all data from a row in one column

I have another problem with csv. I am using pandas to remove duplicates from a csv file. After doing so I noticed that all data has been put in one column (preprocessed data has been contained in 9 columns). How to avoid that?
Here is the data sample:
39,43,197,311,112,88,47,36,Label_1
Here is the function:
import pandas as pd
def clear_duplicates():
df = pd.read_csv("own_test.csv", sep="\n")
df.drop_duplicates(subset=None, inplace=True)
df.to_csv("own_test.csv", index=False)
Remove sep, because default separator is , in read_csv:
def clear_duplicates():
df = pd.read_csv("own_test.csv")
df.drop_duplicates(inplace=True)
df.to_csv("own_test.csv", index=False)
Maybe not so nice, but works too:
pd.read_csv("own_test.csv").drop_duplicates().to_csv("own_test.csv", index=False)

How to change the name of columns of a Pandas dataframe when it was saved with "pickle"?

I saved a Pandas DataFrame with "pickle". When I call it it looks like Figure A (that is alright). But when I want to change the name of the columns it looks like Figure B.
What am I doing wrong? What are the other ways to change the name of columns?
Figure A
Figure B
import pandas as pd
df = pd.read_pickle('/home/myfile')
df = pd.DataFrame(df, columns=('AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL'))
df
read.pickle already returns a DataFrame.
And you're trying to create a DataFrame from an existing DataFrame, just with renamed columns. That's not necessary...
As you want to rename all columns:
df.columns = ['AWA', 'REM','S1','S2','SWS','ALL']
Renaming specific columns in general could be achieved with:
df.rename(columns={'REM':'NewColumnName'},inplace=True)
Pandas docs
I have just solved it.
df = pd.read_pickle('/home/myfile')
df1 = pd.DataFrame(df.values*100)
df1.index='Feature' + (df1.index+1).astype(str)
df1.columns=('AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL')
df1

Resources