Delete empty dataframe with loop in Python - python-3.x

I have a serie of different DataFrame in Python. I want to check if each of them is either empty or not and then delete those that are actually empty. I am trying with a loop but none dataframes are actually deleted, also those that are actually empty. Here the example, where df_A, df_B, df_C, and df_D are my dataframes and the last one (df_D) is empy.
df_names = [df_A, df_B, df_C, df_D]
for df_ in df_names:
if df_.empty: del df_
For sure I am missing something quite simple, I hope you can help me with this (probably a bit silly) question.

You can use the python locals() function to do this. I would first save the dataframes in a list as string:
Code
df_names = ['df_A', 'df_B', 'df_C', 'df_D']
for df_ in df_names:
if locals()[df_].empty:
del locals()[df_]
You can also check if your dataframe has been deleted using the below code:
alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]
for i in alldfs:
if i[:1] != '_':
print (i)
The above snippet will return all the existing dataframes (excluding the ones defined by python internally)

Related

split time series dataframe when value change

I'have a Dataframe, that correspond to lat/long of an object in movement.
This object go from one place to another, and I created a column that reference what place he is at every second.
I want to split that dataframe, so when the object go in one place, the leave to another, I'll have two separate dataframe.
'None' mean he is between places
My actual code :
def cut_df2(df):
df_copy = df.copy()
#check if change of place
df_copy['changed'] = df_copy['place'].ne(df_copy['place'].shift().bfill()).astype(int)
last = 0
dfs= []
for num, line in df_copy.iterrows():
if line.changed:
dfs.append(df.iloc[last:num,:])
last = num
# Check if last line was in a place
if line.place != 'None':
dfs.append(df.iloc[last:,:])
df_outs= []
# Delete empty dataframes
for num, dataframe in enumerate(dfs):
if not dataframe.empty :
if dataframe.reset_index().place.iloc[0] != 'None':
df_outs.append(dataframe)
return df_outs
It won't work on big dataset, but work on simple examples and I've no idea why, anyone can help me?
Try using this instead:
https://www.geeksforgeeks.org/split-pandas-dataframe-by-rows/
iloc can be a good way to split a dataframe
df1 = datasX.iloc[:, :72]
df2 = datasX.iloc[:, 72:]

Iterate over 4 pandas data frame columns and store them into 4 lists with one for loop instead of 4 for loops

I am currently working on pandas structure in Python. I wrote a function that extracts data from Pandas data frame and stores it in lists. The code is working but I feel like there is a part that I could write in one for loop instead four for loops. I will give you an example below. The idea of this part of the code is to extract four columns from a pandas data frame into four lists. I did it with 4 separate for loops but I want to have one loop that does the thing.
col1,col1,col1,col1 = [],[],[],[]
for j in abc['col1']:
col1.append(j)
for k in abc['col2']:
col2.append(k)
for l in abc['col3']:
col3.append(l)
for n in abc['col4']:
col4.append(n)
And my idea is to write a one for loop that does all the code. I tried to do something like this, but it doesn't work.
col1,col1,col1,col1 = [],[],[],[]
for j,k,l,n in abc[['col1','col2','col3','col4']]
col1.append(j)
col2.append(k)
col3.append(l)
col4.append(n)
Can you help me with this idea to wrap four for loops into the one? I would appreciate your help!
You don't need to use loops at all; you can just convert each column into a list directly.
list_1 = df["col"]to_list()
Have a look at this previous question.
Treating a panda dataframe like a list usually works, but is very bad for performance. I'd consider using the iterrows() function instead.
This would work as in the following example:
col1,col2,col3,col4 = [],[],[],[]
for index, row in df.iterrows():
col1.append(row['col1'])
col2.append(row['col2'])
col3.append(row['col3'])
col4.append(row['col4'])
It's probably easier to use pandas.values and then numpy.ndarray.to_list():
col = ['col1','col2','col3']
data = []*len(col)
for i in range(len(col)):
data[i] = df[col(i)].values.to_list()

How to extract several dataframes from dictionary

I currently trying to extract several dataframes from a dictionary. The problem is, that the number of dataframes will vary, sometimes I'll have two dataframes in there and sometimes 30.
At the beginning I create a dictionary (dict_of_exceptions) from a dataframe (exceptions_df). In this dictionary I'll have several dataframes depending on how many different 'Source Wells' I have. With the current code I can extract the first dataframe from the dictionary which is j:
dict_of_exceptions = {k: v for k, v in exceptions_df.groupby('Source Well') }
print (dict_of_exceptions)
for k in dict_of_exceptions.keys():
j = dict_of_exceptions[k]
Could someone help me modify the last line to go trough the dictionary and extract each dataframe (and name them like the corresponding key)?
I think I get your intention, but could not really read your intentions from your code though. Currently, as #cyrilb38 stated in comments, your loop is overriding j, so you would only be able to see the result of last iteration. Anyways, rather transforming use dataframe instead, and I think (may be wrong) that you call your row a dataframe. Replacing a groupby object with dict is not what you wanted, or it is just prolonging the process for nothing.
If you want to see the info of Well X only for example, try this
exceptions_df[exceptins_df['Source Well'] == 'Well X']

How to save tuples output form for loop to DataFrame Python

I have some data 33k rows x 57 columns.
In some columns there is a data which I want to translate with dictionary.
I have done translation, but now I want to write back translated data to my data set.
I have problem with saving tuples output from for loop.
I am using tuples for creating good translation. .join and .append is not working in my case. I was trying in many case but without any success.
Looking for any advice.
data = pd.read_csv(filepath, engine="python", sep=";", keep_default_na=False)
for index, row in data.iterrows():
row["translated"] = (tuple(slownik.get(znak) for znak in row["1st_service"]))
I just want to see in print(data["1st_service"] a translated data not the previous one before for loop.
First of all, if your csv doesn't already have a 'translated' column, you'll have to add it:
import numpy as np
data['translated'] = np.nan
The problem is the row object you're trying to write to is only a view of the dataframe, it's not the dataframe itself. Plus you're missing square brackets for your list comprehension, if I'm understanding what you're doing. So change your last line to:
data.loc[index, "translated"] = tuple([slownik.get(znak) for znak in row["1st_service"]])
and you'll get a tuple written into that one cell.
In future, posting the exact error message you're getting is very helpful!
I have manage it, below working code:
data = pd.read_csv(filepath, engine="python", sep=";", keep_default_na=False)
data.columns = []
slownik = dict([ ])
trans = ' '
for index, row in data.iterrows():
trans += str(tuple([slownik.get(znak) for znak in row["1st_service"]]))
data['1st_service'] = trans.split(')(')
data.to_csv("out.csv", index=False)
Can you tell me if it is well done?
Maybe there is an faster way to do it?
I am doing it for 12 columns in one for loop, as shown up.

Update Pandas DF during while loop (Python3, Pandas)

Some background: My code takes user input and applies it to my DF to remove certain rows. This process repeats as many times as the user would like. Unfortunately, I am not sure how to update my DF within the while loop I have created so that it keeps the changes being made:
data = ({'hello':['the man','is a','good guy']})
df = pd.DataFrame(data)
def func():
while True:
n = input('Words: ')
if n == "Done":
break
elif n != "Done":
pattern = '^'+''.join('(?=.*{})'.format(word) for word in n.split())
df[df['hello'].str.contains(pattern)==False]
How do I update the DF at the end of each loop so the changes being made stay put?
Ok, I reevaluated your problem and my old answer was totally wrong of course.
What you want is the DataFrame.drop method. This can be done inplace.
mask = df['hello'].str.contains(pattern)
df.drop(mask, inplace=True)
This will update your DataFrame.
Looks to me like you've already done all the hard work, but there are two problems.
Your last line doesn't store the result anywhere. Most Pandas operations are not "in-place", which means you have to store the result somewhere to be able to use it later.
df is a global variable, and setting its value inside a function doesn't work, unless you explicitly have a line stating global df. See the good answers to this question for more detail.
So I think you just need to do:
df = df[df['hello'].str.contains(pattern)==False]
to fix problem one.
For problem two, at the end of func, do return df then when you call func call it like:
df = func(df)
OR, start func with the line
global df

Resources