looping through list of pandas dataframes and make it empty dataframe - python-3.x

I have a multiple pandas dataframe. I want empty each dataframes like below
df1 = pd.DataFrame()
df2 = pd.DataFrame()
Instead of doing it individually, is there any way to do it in one line of code.

If I understood correctly, this will work:
df_list = []
for i in range (0,10):
df = pd.DataFrame()
df_list.append(df)
print(df_list[0].head())

Related

Extracting Data From Pandas DataFrame

I have two pandas dataframe named df1 and df2. I want to extract same named files from both of the dataframe and put extracted in two columns in a data frame. I want the take, files name from df1 and match with df2 (df2 has more files than df1). There is only one column in both dataframe (df1 and df2). The "BOLD" one started with letter s**** is the common matching alpha-numeric characters. We have to match both dataframe on that.
df1["Text_File_Location"] =
0 /home/mzkhan/2.0.0/files/p15/p15546261/s537061.txt
1 /home/mzkhan/2.0.0/files/p15/p15098455/s586955.txt
df2["Image_File_Location"]=
0 /media/mzkhan/external_dr/radiology_image/2.0.0/files/p10/p10000032/s537061/02aa804e- bde0afdd-112c0b34-7bc16630-4e384014.jpg'
1 /media/mzkhan/external_dr/radiology_image/2.0.0/files/p10/p10000032/s586955/174413ec-4ec4c1f7-34ea26b7-c5f994f8-79ef1962.jpg
In Python 3.4+, you can use pathlib to handily work with filepaths. You can extract the filename without extension ("stem") from df1 and then you can extract the parent folder name from df2. Then, you can do an inner merge on those names.
import pandas as pd
from pathlib import Path
df1 = pd.DataFrame(
{
"Text_File_Location": [
"/home/mzkhan/2.0.0/files/p15/p15546261/s537061.txt",
"/home/mzkhan/2.0.0/files/p15/p15098455/s586955.txt",
]
}
)
df2 = pd.DataFrame(
{
"Image_File_Location": [
"/media/mzkhan/external_dr/radiology_image/2.0.0/files/p10/p10000032/s537061/02aa804e- bde0afdd-112c0b34-7bc16630-4e384014.jpg",
"/media/mzkhan/external_dr/radiology_image/2.0.0/files/p10/p10000032/s586955/174413ec-4ec4c1f7-34ea26b7-c5f994f8-79ef1962.jpg",
"/media/mzkhan/external_dr/radiology_image/2.0.0/files/p10/p10000032/foo/bar.jpg",
]
}
)
df1["name"] = df1["Text_File_Location"].apply(lambda x: Path(str(x)).stem)
df2["name"] = df2["Image_File_Location"].apply(lambda x: Path(str(x)).parent.name)
df3 = pd.merge(df1, df2, on="name", how="inner")

Better way to swap column values and then append them in a pandas dataframe?

here is my dataframe
import pandas as pd
data = {'from':['Frida', 'Frida', 'Frida', 'Pablo','Pablo'], 'to':['Vincent','Pablo','Andy','Vincent','Andy'],
'score':[2, 2, 1, 1, 1]}
df = pd.DataFrame(data)
df
I want to swap the values in columns 'from' and 'to' and add them on because these scores work both ways.. here is what I have tried.
df_copy = df.copy()
df_copy.rename(columns={"from":"to","to":"from"}, inplace=True)
df_final = df.append(df_copy)
which works but is there a shorter way to do the same?
One line could be :
df_final = df.append(df.rename(columns={"from":"to","to":"from"}))
On the right track. However, introduce deep=True to make a true copy, otherwise your df.copy will just update df and you will be up in a circle.
df_copy = df.copy(deep=True)
df_copy.rename(columns={"from":"to","to":"from"}, inplace=True)
df_final = df.append(df_copy)

Iteratively append new data into pandas dataframe column and join with another dataframe

I have been doing data extract from many API. I would like to add a common column among all APIs.
And I have tried below
df = pd.DataFrame()
for i in range(1,200):
url = '{id}/values'.format(id=i)
res = request.get(url,headers=headers)
if res.status_code==200:
data =json.loads(res.content.decode('utf-8'))
if data['success']:
df['id'] = i
test = pd.json_normalize(data[parent][child])
df = df.append(test,index=False)
But data-frame id column I'm getting only the last iterated id only. And in case of APIs has many rows I'm getting invalid data.
From performance reasons it would be better first storing data in a dictionary and then create from this dictionary dataframe:
import pandas as pd
from collections import defaultdict
d = defaultdict(list)
for i in range(1,200):
# simulate dataframe retrieved from pd.json_normalize() call
row = pd.DataFrame({'id': [i], 'field1': [f'f1-{i}'], 'field2': [f'f2-{i}'], 'field3': [f'f3-{i}']})
for k, v in row.to_dict().items():
d[k].append(v[0])
df = pd.DataFrame(d)

Python: Import multiple dataframes using for loop

I have the following code which works to import a dataframe.
#read tblA
tbl = 'a'
cols = 'imp_a'
usecols = dfDD[dfDD[cols].notnull()][cols].values.tolist()
dfa = getdf(tbl, dfRT, sfsession)
dfa = dfa[usecols]
#read tblB
tbl = 'b'
cols = 'imp_sb'
usecols = dfDD[dfDD[cols].notnull()][cols].values.tolist()
dfb = getdf(tbl, dfRT, sfsession)
dfb = dfb[usecols]
#importing a few more tables in the steps as above two...
Is there a way to shorten this code and avoiding writing the same thing multiple times. The values that change are tbl, cols, dataframe name (df..)
I tried a few different things including putting all the changing attributes into a dictionary, but wasn't able to make it work. I could create a function, but the function would require a few more parameters - dfDD, dfRT, sfsession. I don't think it's a great solution. There has to be a better way to write this.
The loop should be fairly simple like this -
import pandas as pd
Create a dictionary that will store your dataframes.
df_dict = {}
config = {'tblA': {'tbl': 'a', 'cols': 'imp_a'},
'tblB': {'tbl': 'b', 'cols': 'imp_sb'}}
# Loop through the config
for key, val in config.items():
tbl = val['tbl']
cols = val['cols']
usecols = dfDD[dfDD[cols].notnull()][cols].values.tolist()
df = getdf(tbl, dfRT, sfsession)[usecols]
df_dict [key] = df # Store your dataframe in the dictionary
print(f"Created dataframe for table - {key} ({tbl} | {cols})")

pandas SettingWithCopyWarning only inside function

With a dataframe like
import pandas as pd
df = pd.DataFrame(
["2017-01-01 04:45:00", "2017-01-01 04:45:00removeMe"], columns=["col"]
)
why do I get a SettingWithCopyWarning here
def test_fun(df):
df = df[~df["col"].str.endswith("removeMe")]
df.loc[:, "col"] = pd.to_datetime(df["col"])
return df
df = test_fun(df)
but not if I run it without the function?
df = df[~df["col"].str.endswith("removeMe")]
df.loc[:, "col"] = pd.to_datetime(df["col"])
And how is my function supposed to look like?
In the function, you have df, which when you index it with your boolean array, gives a view of the outside-scope df - then you're trying to additionally index that view, which is why the warning comes in. Without the function, df is just a dataframe that's resized with your index instead (it's not a view).
I would write it as this instead either way:
df["col"] = pd.to_datetime(df["col"], errors='coerce')
return df[~pd.isna(df["col"])]
Found the trick:
def test_fun(df):
df.loc[:] = df[~df["col"].str.endswith("removeMe")] <------- I added the `.loc[:]`
df.loc[:, "col"] = pd.to_datetime(df["col"])
return df
Don't do df = ... in the function.
Instead do df.loc[:] = ... !

Resources