how to change the method into .csv file - python-3.x

import pandas as pd
f=open("xyz.csv",'w')
df=pd.DataFrame(sq3row)
df.to_csv(f)
i am using above code to write sq3lite output rows to .csv file based on conditions
but instead of row wise the output is writing in columns for ex data's to be written from row[0] to row[11] is writing to col[0] to col[11] and second output row is writing from col[12] to col[24] likewise
how to write this in row wise like col[1]:row[0] to row[11]
for next col[2]: row[0] to row[11]

## I think you are trying to transpose your df
df = df.T

you can use transpose function to convert rows into columns and vice versa.
import pandas as pd
f=open("xyz.csv",'w')
df=pd.DataFrame(sq3row)
df.to_csv(f)
df.transpose()

Related

I'm using pandas to delete first 9 rows but it is still keeping the first row

Below is my code -
import pandas as pd
import datetime
df1 = pd.read_excel(str(sys_folder) + "Italy_SS304.xlsx")
df1.drop(df1.index[0:9], axis=0, inplace=True)
df1.drop(df1.columns[1:3], axis=1, inplace=True)
df1
attached image is my database from excel
While reading the excel, the first row is taken as header, unless you explicitly say it is not. You need to add header = False so that the dataframe does not take the first row as header.
Assuming you are writing df1 back into an excel using something like df1.to_excel(), you will need to use the same header=None, index=False, assuming you don't want to add index.
Change the read_excel file to as shown below.
df1 = pd.read_excel(str(sys_folder) + "Italy_SS304.xlsx", header=None)
...and if you are writing back to excel, use the line like this (after the drop commands)
df1.to_excel('NEWFILE.xlsx', header=None, index=False)

How to create column names in pandas dataframe?

I have exported the gold price data from the brokers. the file has no column names like this
2014.02.13,00:00,1291.00,1302.90,1286.20,1302.30,41906
2014.02.14,00:00,1301.80,1321.20,1299.80,1318.70,46244
2014.02.17,00:00,1318.20,1329.80,1318.10,1328.60,26811
2014.02.18,00:00,1328.60,1332.10,1312.60,1321.40,46226
I read csv to pandas dataframe and it take the first row to be the column names. I am curious how can I set the column names and still have all the data
Thank you
if you don't have a header in the CSV, you can instruct Pandas to ignore it using the header parameter
df = pd.read_csv(file_path, header=None)
to manually assign column names to the DataFrame, you may use
df.columns = ["col1", "col2", ...]
Encourage you to go through the read_csv documentation to learn more about the options it provides.

How to find the nlargest from a large csv file using pandas (chunked)?

Given a very large csv file with many rows and 3 columns:
the file is read as following :
import pandas as pd
df = pd.read_csv("test.csv", sep=" ", chunksize=100000)
Now how to get the N largest rows based on the values in the 3rd column when chunkzise is utilized ?
Try this:
print(df.nlargest(N, columns=df.columns[2]))

Change index column text in pandas

I have the following spreadsheet that I am bringing in to pandas:
Excel Spreadsheet
I import it with:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
Jupyter shows it like this:
Panda Dataframe 1
I then transpose the dataframe with
df = df.T
Which results in this
Transposed DataFrame
At this stage how can I now change the text in the leftmost index column? I want to change the word Day to the word Service, but I am not sure how to address that cell/header. I can't refer to column 0 and change the header for that.
Likewise how could i then go on to change the A, B, C, D text which is now the index column?
You could first assign to the columns attribute, and then apply the transposition.
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df.columns = ['Service','AA', 'BB', 'CC', 'DD']
df = df.T
Renaming the columns before transposing would work. To do exactly what you want, you can use the the rename function. In the documentation it also has a helpful example on how to rename the index.
Your example in full:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df = df.T
dict_rename = {'Day': 'Service'}
df.rename(index = dict_rename)
To extend this to more index values, you merely need to adjust the dict_rename argument before renaming.
Full sample:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df = df.T
dict_rename = {'Day': 'Service','A':'AA','B':'BB','C':'CC','D':'DD'}
df.rename(index = dict_rename)

How do I convert range of openpyxl cells to pandas dataframe without looping though all cells?

Openpyxl supports converting an entire worksheet of an excel 2010 workbook to a pandas dataframe. I want to select a subset of those cells, using Excel's native indices, and convert that block of cells to a dataframe. Openpyxl's documentation on working with pandas does not help: https://openpyxl.readthedocs.io/en/stable/pandas.html
I am trying to avoid 1) Looping through all rows and columns in the data, since that's inefficient 2) removing this cells from the dataframe after creation instead, and 3) Pandas' read_excel module, since it does not seem to support specifying the range in Excel's native indices.
#This converts an entire workbook to a pandas dataframe
import pandas as pd
import openpyxl as px
Work_Book = px.load_workbook(filename='MyBook.xlsx')
Work_Sheet = Work_Book['Sheet1']
df = pd.DataFrame(Work_Sheet.values)
#This produces a tuple of cells. Calling pd.DataFrame on it returns
#"ValueError: DataFrame constructor not properly called!"
Cell_Range = Work_Sheet['B2:D4']
#This is the only way I currently know to convert Cell_Range to a Pandas
# DataFrame. I'm trying to avoid these nested loops.
row_list = []
for row in Cell_Range:
col_list = []
for col in row:
col_list.append(col.value)
row_list.append(col_list)
df = pd.DataFrame(row_list)
I am trying to find the most efficient way to convert the Cell_Range object above into a pandas dataframe. Thanks!
Work_Sheet.values will give you a generator. Turn it into a list to generate a list of tuples, with first tuple having the headers.
To covert it into dataframe, do the following:
df = pd.DataFrame(list(Work_Sheet.values))
df.columns = df.iloc[0,:]
df = df.iloc[1:,].reset_index(drop=True)

Resources