Create a Dataframe from an excel file - python-3.x

I want create a Dataframe from excel file. I am using pandas read_excel function. My requirement is to create a Dataframe for all elements if the column matches some value.
For eg:- Below is my excel file and I want to create the Dataframe with all elements that has Module equal to 'DC-Prod'
Exccel File Image

Welcome, Saagar Sheth!
to make a Dataframe, just import "pandas" it like so...
import pandas as pd
then create a variable for the file to access, like this;
file_var_pandas = 'customer_data.xlsx'
and then, create its dataframe using the read_excel;
customers = pd.read_excel(file_var_pandas,
sheetname=0,
header=0,
index_col=False,
keep_default_na=True
)
finally, use the head() command like so;
customers.head()
if you want to know more just go to this website!
Packet Pandas Dataframe
and have fun!

Related

Read excel using pandas and join two pandas dataframes without losing formatting styles

Initially I have two excel files. Input file1 contains some colors present in excel columns.
Another excel file looks likes this.
I have to join this two excel file using openpyxl or xlsxwriter(python library) or by any other methods. And in the output file I don't want to loose colors. output file will look like the below image.
please use the code below to create the pandas dataframe for the two input files.
import pandas as pd
df = pd.DataFrame({'id':[1,2,3,4],
'name':['rahul','raju','mohan','ram'],
'salary':[20000,34000,10000,998765]
})
print(df)
df1 = pd.DataFrame({'id':[1,2,3,4],
'state':['gujrat','bhopal','mumbai','kolkata']
})
print(df1)

Using a for loop in pandas

I have 2 different tabular files, in excel formats. I want to know if an id number from one of the columns in the first excel file (from the "ID" column) exists in the proteome file in a specific column (take "IHD" for example) and if so, to display the value associated with it. Is there a way to do this, specifically in pandas and possible using a for loop?
After loading the excel files with read_excel(), you should merge() the dataframes on ID and protein. This is the recommended approach with pandas rather than looping.
import pandas as pd
clusters = pd.read_excel('clusters.xlsx')
proteins = pd.read_excel('proteins.xlsx')
clusters.merge(proteins, left_on='ID', right_on='protein')

Delete every pandas dataframe in final script

I'm using pandas dataframes in different scripts. For example:
script1.py:
import pandas as pd
df1 = pd.read_csv("textfile1.csv")
"do stuff with df1 including copying some columns to use"
script2.py:
import pandas
df2 = pd.read_csv("textfile2.csv")
"do stuff with df2 including using .loc to grab some specific rows.
and then using these two dataframes (in reality I'm using about 50 dataframes) in different Flask views and python scripts. However, when I go to the Homepage of my Flask application and I follow the steps to create a new result based on a different input file, the result keeps giving me the old (or the first) results file based on the dataframes it read in the first time.
I tried (mostly in combination of one another):
- logout_user()
- session.clear()
- CACHE_TYPE=null
- gc.collect()
- SECRET_KEY = str(uuid.uuid4())
- for var in dir():
if isinstance(eval(var), pd.core.frame.DataFrame):
del globals()[var]
I can't (read: shouldn't) delete pandas dataframes after they are created, as it is all interconnected. But what I would like is to have a button at the end of the last page, and if I were to click it, it would delete every pandas dataframe that exists in every script or in memory. Is that a possibility? That would hopefully solve my problem.
Try with a class
class Dataframe1():
def __init__(self, data):
self.data = data
d1 = Dataframe1(pd.read_csv("textfile1.csv"))
If you want to access to the data
d1.data
To delete
del d1

Feature extraction from data stored in PostgreSQL database

I have some data stored in PostgreSQL database, which contains fields like cost, start date, end date, country, etc. Please take a look at the data here.
Now what I want to do is extract some of the important features/fields from this data and store them in a separate CSV file or pandas data frame so I can use the extracted data for analysis.
Is there any python script to do this task? Please let me know. Thanks.
Firstly you should import your postgresql table data into dataframe which can be done by ,
import psycopg2 as pg
import pandas.io.sql as psql
# get connected to the database
connection = pg.connect("dbname=mydatabase user=postgres")
dataframe = psql.frame_query("SELECT * FROM <tablename>", connection)
explained here https://gist.github.com/00krishna/9026574 .
After that we can select specific columns in pandas dataframe . these can be done by ,
df1 = dataframe[['projectfinancialtype','regionname']]
# here you can select n number of feature columns which is available in your dataframe i had only took 2 fields of your json
Now for putting these feature column into csv we can use code like these,
df1.to_csv("pathofoutput.csv", cols=['projectfinancialtype','regionname'])
#it will create csv with your feature columns
May these helps

How to write summary of spark sql dataframe to excel file

I have a very large Dataframe with 8000 columns and 50000 rows.
I want to write its statistics information into excel file.
I think we can use describe() method. But how to write it to excel in good format. Thanks
The return type for describe is a pyspark dataframe. The easiest way to get the describe dataframe into an excel readable format is to convert it to a pandas dataframe and then write the pandas dataframe out as a csv file as below
import pandas
df.describe().toPandas().to_csv('fileOutput.csv')
If you want it in excel format, you can try below
import pandas
df.describe().toPandas().to_excel('fileOutput.xls', sheet_name = 'Sheet1', index = False)
Note, the above requires xlwt package to be installed (pip install xlwt in the command line)

Resources