How to create column names in pandas dataframe? - python-3.x

I have exported the gold price data from the brokers. the file has no column names like this
2014.02.13,00:00,1291.00,1302.90,1286.20,1302.30,41906
2014.02.14,00:00,1301.80,1321.20,1299.80,1318.70,46244
2014.02.17,00:00,1318.20,1329.80,1318.10,1328.60,26811
2014.02.18,00:00,1328.60,1332.10,1312.60,1321.40,46226
I read csv to pandas dataframe and it take the first row to be the column names. I am curious how can I set the column names and still have all the data
Thank you

if you don't have a header in the CSV, you can instruct Pandas to ignore it using the header parameter
df = pd.read_csv(file_path, header=None)
to manually assign column names to the DataFrame, you may use
df.columns = ["col1", "col2", ...]
Encourage you to go through the read_csv documentation to learn more about the options it provides.

Related

Pandas combining rows as header info

This is how I am reading and creating the dataframe with pandas
def get_sheet_data(sheet_name='SomeName'):
df = pd.read_excel(f'{full_q_name}',
sheet_name=sheet_name,
header=[0,1],
index_col=0)#.fillna(method='ffill')
df = df.swapaxes(axis1="index", axis2="columns")
return df.set_index('Product Code')
printing this tabularized gives me(this potentially will have hundreds of columns):
I cant seem to add those first two rows into the header, I've tried:
python:pandas - How to combine first two rows of pandas dataframe to dataframe header?https://stackoverflow.com/questions/59837241/combine-first-row-and-header-with-pandas
and I'm failing at each point. I think its because of the multiindex, not necessarily the axis swap? But using: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.html is kind of going over my head right now. Please help me add those two rows into the header?
The output of df.columns is massive so Ive cut it down alot:
Index(['Product Code','Product Narrative\nHigh-level service description','Product Name','Huawei Product ID','Type','Bill Cycle Alignment',nan,'Stackable',nan,
and ends with:
nan], dtype='object')
We Create new column names and set them to df.columns, the new column names are generated by joining the 3 Multindex headers and the 1st row of the DataFrame.
df.columns = ['_'.join(i) for i in zip(df.columns.get_level_values(0).tolist(), df.columns.get_level_values(1).tolist(), df.iloc[0,:].replace(np.nan,'').tolist())]

Pandas - concatanation multiple excel files with different names but the same data type

I have about 50 excel sheet with .'xlsb' extension. I'd like to concatanate a specific worksheet into pandas DataFrame (all worksheets names are the same). The problem I have is that the names of columns are not exactly the same in each worksheet. I wrote a code using pandas but the way it works is that it concats all values into the same column in pandas data frame but based on the name of column. So for example: sometimes I have column called: FgsNr and sometimes FgNr - the datatype and the meaning in both columns are exactly the same and I would like to have them in the same column in Data Frame but pandas creates to separate columns in data frame and stack together only those values that are listed in column with the same name.
files = glob(r'C:\Users\Folder\*xlsb')
for file in files:
Datafile = pd.concat(pd.read_excel(file, engine='pyxlsb', sheet_name='Sheet1', usecols='A:F', header=0) for file in files)
How could I correct the code so it copied and concatanted all values based on columns from excel at the same time ignoring the names of columns ?
When concatenating multiple dataframes with the same format, you can use the below snippet for speed and efficiency.
The basic logic is that you put them into a list, and then concatenate at the final stage.
files = glob(r'C:\Users\Folder\*xlsb')
dfs = []
for file in files:
df = pd.read_excel(file, engine='pyxlsb', sheet_name='Sheet1', usecols='A:F', header=0)
dfs.append(df)
large_df = pd.concat(dfs, ignore_index=True)
Also refer to the below :
Creating an empty Pandas DataFrame, then filling it?

Pandas read file incomplete column number comma separated file

Hello I have CSV file which has no header while reading I am getting error the data CSV look as below
School1,std1,std2
Schoo2,std3,std4,std5,std6,std7
School4,std1,std6
School6,std9,std10
Because of incomplete column not able read
df=of.read_csv("test.txt",sep=",", header=None)
Can any one suggest me how can I read this file
If you know the number of columns of your largest row, you can just create a list to be the header, as for your example, index 1 has 6 columns, so:
col_names = [1,2,3,4,5,6]
df = pd.read_csv("test.txt",sep=",", header=None, names=col_names)
If you're handling with a huge amount of rows and don't know the max number of columns, you can check this topic for a better solution:
import csv with different number of columns per row using Pandas

how to change the method into .csv file

import pandas as pd
f=open("xyz.csv",'w')
df=pd.DataFrame(sq3row)
df.to_csv(f)
i am using above code to write sq3lite output rows to .csv file based on conditions
but instead of row wise the output is writing in columns for ex data's to be written from row[0] to row[11] is writing to col[0] to col[11] and second output row is writing from col[12] to col[24] likewise
how to write this in row wise like col[1]:row[0] to row[11]
for next col[2]: row[0] to row[11]
## I think you are trying to transpose your df
df = df.T
you can use transpose function to convert rows into columns and vice versa.
import pandas as pd
f=open("xyz.csv",'w')
df=pd.DataFrame(sq3row)
df.to_csv(f)
df.transpose()

pandas read_csv create new column and usecols at the same time

I'm trying to load multiple csv files into a single dataframe df while:
adding column names
adding and populating a new column (Station)
excluding one of the columns (QD)
All of this works fine until I attempt to exclude a column with usecols, which throws the error Too many columns specified: expected 5 and found 4.
Is it possible to create a new column and pass usecols at the same time?
The reason I'm creating & populating a new 'Station' column during read_csv is my dataframe will contain data from multiple stations. I can work around the error by doing read_csv in one statement and dropping the QD column in the next with df.drop('QD', axis=1, inplace=True) but want to make sure I understand how to do this the most pandas way possible.
Here's the code that throws the error:
df = pd.concat(pd.read_csv("http://lgdc.uml.edu/common/DIDBGetValues?ursiCode=" + row['StationCode'] + "&charName=MUFD&DMUF=3000",
skiprows=17,
delim_whitespace=True,
parse_dates=[0],
usecols=['Time','CS','MUFD','Station'],
names=['Time','CS','MUFD','QD','Station']
).fillna(row['StationCode']
).set_index(['Time', 'Station'])
for index, row in stationdf.iterrows())
Example StationCode from stationdf BC840.
Data sample 2016-09-19T00:00:05.000Z 100 19.34 //
You can create a new column using operator chaining with assign:
df = pd.read_csv(...).assign(StationCode=row['StationCode'])

Resources