Delete every pandas dataframe in final script - python-3.x

I'm using pandas dataframes in different scripts. For example:
script1.py:
import pandas as pd
df1 = pd.read_csv("textfile1.csv")
"do stuff with df1 including copying some columns to use"
script2.py:
import pandas
df2 = pd.read_csv("textfile2.csv")
"do stuff with df2 including using .loc to grab some specific rows.
and then using these two dataframes (in reality I'm using about 50 dataframes) in different Flask views and python scripts. However, when I go to the Homepage of my Flask application and I follow the steps to create a new result based on a different input file, the result keeps giving me the old (or the first) results file based on the dataframes it read in the first time.
I tried (mostly in combination of one another):
- logout_user()
- session.clear()
- CACHE_TYPE=null
- gc.collect()
- SECRET_KEY = str(uuid.uuid4())
- for var in dir():
if isinstance(eval(var), pd.core.frame.DataFrame):
del globals()[var]
I can't (read: shouldn't) delete pandas dataframes after they are created, as it is all interconnected. But what I would like is to have a button at the end of the last page, and if I were to click it, it would delete every pandas dataframe that exists in every script or in memory. Is that a possibility? That would hopefully solve my problem.

Try with a class
class Dataframe1():
def __init__(self, data):
self.data = data
d1 = Dataframe1(pd.read_csv("textfile1.csv"))
If you want to access to the data
d1.data
To delete
del d1

Related

colab getting crashed while running the python script

i'm using the following code for merging two excel files which has around 800k rows in each, is there any other way to merge the files in same fashion or any solution?
import pandas as pd
df=pd.read_csv("master file.csv")
df1=pd.read_csv("onto_diseas.csv")
df4=pd.merge(df, df1, left_on = 'extId', right_on = 'extId', how = 'inner')
df4
Try use dask or drill for merging or specify datatypes demanding less memory while (float16 instead float64 & so forth) creating dataframes. I could show this in code if you provide links to your files.

Using a for loop in pandas

I have 2 different tabular files, in excel formats. I want to know if an id number from one of the columns in the first excel file (from the "ID" column) exists in the proteome file in a specific column (take "IHD" for example) and if so, to display the value associated with it. Is there a way to do this, specifically in pandas and possible using a for loop?
After loading the excel files with read_excel(), you should merge() the dataframes on ID and protein. This is the recommended approach with pandas rather than looping.
import pandas as pd
clusters = pd.read_excel('clusters.xlsx')
proteins = pd.read_excel('proteins.xlsx')
clusters.merge(proteins, left_on='ID', right_on='protein')

Create a Dataframe from an excel file

I want create a Dataframe from excel file. I am using pandas read_excel function. My requirement is to create a Dataframe for all elements if the column matches some value.
For eg:- Below is my excel file and I want to create the Dataframe with all elements that has Module equal to 'DC-Prod'
Exccel File Image
Welcome, Saagar Sheth!
to make a Dataframe, just import "pandas" it like so...
import pandas as pd
then create a variable for the file to access, like this;
file_var_pandas = 'customer_data.xlsx'
and then, create its dataframe using the read_excel;
customers = pd.read_excel(file_var_pandas,
sheetname=0,
header=0,
index_col=False,
keep_default_na=True
)
finally, use the head() command like so;
customers.head()
if you want to know more just go to this website!
Packet Pandas Dataframe
and have fun!

Dask Dataframe View Entire Row

I want to see the entire row for a dask dataframe without the fields being cutoff, in pandas the command is pd.set_option('display.max_colwidth', -1), is there an equivalent for dask? I was not able to find anything.
You can import pandas and use pd.set_option() and Dask will respect pandas' settings.
import pandas as pd
# Don't truncate text fields in the display
pd.set_option("display.max_colwidth", -1)
dd.head()
And you should see the long columns. It 'just works.'
Dask does not normally display the data in a dataframe at all, because it represents lazily-evaluated values. You may want to get a specific row by index, using the .loc accessor (same as in Pandas, but only efficient if the index is known to be sorted).
If you meant to get the whole list of columns only, you can get this by the .columns attribute.

Feature extraction from data stored in PostgreSQL database

I have some data stored in PostgreSQL database, which contains fields like cost, start date, end date, country, etc. Please take a look at the data here.
Now what I want to do is extract some of the important features/fields from this data and store them in a separate CSV file or pandas data frame so I can use the extracted data for analysis.
Is there any python script to do this task? Please let me know. Thanks.
Firstly you should import your postgresql table data into dataframe which can be done by ,
import psycopg2 as pg
import pandas.io.sql as psql
# get connected to the database
connection = pg.connect("dbname=mydatabase user=postgres")
dataframe = psql.frame_query("SELECT * FROM <tablename>", connection)
explained here https://gist.github.com/00krishna/9026574 .
After that we can select specific columns in pandas dataframe . these can be done by ,
df1 = dataframe[['projectfinancialtype','regionname']]
# here you can select n number of feature columns which is available in your dataframe i had only took 2 fields of your json
Now for putting these feature column into csv we can use code like these,
df1.to_csv("pathofoutput.csv", cols=['projectfinancialtype','regionname'])
#it will create csv with your feature columns
May these helps

Resources