I have a dataframe with two columns, one called 'name' that is a string, and one called 'route' that is a Google polyline. I'm using the polyline library to decode the polyline into lat/long. I want to loop over each row to decode but it only seems to decode only the first row and write it to the rest of the created column. This is what I have so far.
df = pd.DataFrame(activities)
for row in df.itertuples(index=False):
name = row[0]
route = row[1]
try:
decoded = polyline.decode(route.replace('\\\\','\\'), geojson=True)
df['decode'] = df.apply(lambda route: [decoded], axis=1)
except:
print(name)
Use DataFrame.apply with function:
df = pd.DataFrame(activities)
def decoder(name, route):
try:
return polyline.decode(route.replace('\\\\','\\'), geojson=True)
except:
print (name)
return np.nan
df['decode'] = df.apply(lambda x: decoder(x[0], x[1]), axis=1)
Related
I have a dataframe say df_dt_proc with 35 columns.
I want to add a column to the dataframe df_dt_proc['procedures'] which should have all the columns concatenated except column at index 0 separated by , .
I am able to achieve the result by the following script:
df_dt_proc['procedures'] = np.nan
_len = len(df_dt_proc.columns[1:-1])
for i in range(len(df_dt_proc)):
res = ''
for j in range(_len):
try:
res += df_dt_proc[j][i] + ', '
except:
break
df_dt_proc['procedures'][i] = res
However, there must be a more pythonic way to achieve this.
Use custom lambda function with remove NaN and Nones and converting to strings, for select all columns without first and last use DataFrame.iloc:
f = lambda x: ', '.join(x.dropna().astype(str))
df_dt_proc['procedures'] = df_dt_proc.iloc[:, 1:-1].agg(f, axis=1)
Try this with agg:
df_dt_proc['procedures'] = df_dt_proc[df_dt_proc.columns[1:-1]].astype(str).agg(', '.join, axis=1)
i wrote a function which build a df inside it and i want to use it afterwards outside the function or in another function, how can i do it witout facing any recognition problem?
Thankw's a lot :)
The code:
def DisplayDataFrame():
file_path = filedialog.askopenfilename()
df1 = pd.read_excel(file_path)
cols = list(df1.columns)
tree = ttk.Treeview(root)
tree.pack()
tree["columns"] = cols
for i in cols:
tree.column(i, anchor="w")
tree.heading(i, text=i, anchor='w')
for index, row in df1.iterrows():
tree.insert("", 0, text=index, values=list(row))
option = df1.index()
Do you mean use df1 from your DisplayDataFrame() in other functions? If so, you can have return df1 in your function like this:
def DisplayDataFrame():
'''
your original codes to define df1
'''
return df1
dataframe = DisplayDataFrame()
Then you can reuse the dataframe in other functions.
I'm trying to get data from a common api in my industry and append all the data to a csv. The below code works until I try to give it a feed csv, which only contains a list of keywords starting at row 1 (no headers).
parameters = {"type":"phrase_organic",
"phrase":"",
"key": API_KEY,
"database":"us",
"export_column": "Dn,Ur,Fk",
"display_limit":"10"}
keywords = pd.read_csv(r"C:\Users\stff\bulk_kw.csv", header=None)
res_df = None
for i in keywords:
parameters["phrase"] = i **#i is 0 for some reason**
response = requests.get("https://api.semrush.com/", params=parameters)
print(response.url)
tmp_df = pd.read_csv(io.StringIO(response.text), sep=";")
if res_df is None:
res_df = tmp_df
else:
res_df = pd.concat([res_df, tmp_df])
res_df.to_csv(r"C:\Users\stff\SEMrush_call.csv")
I've tried a lot of things e.g.
iteritems
iterrows
df.to_list()
keywords.index
unpacking the loop e.g. for i, j in keywords:
None give me what I want, which is a simple list of the values in the csv to loop through. What it's currently giving me is that i is 0.
I made the following change in my for loop and it works:
for i in keywords.values:
parameters["phrase"] = i #i is 0 for some reason
response = requests.get("https://api.semrush.com/", params=parameters)
print(response.url)
tmp_df = pd.read_csv(io.StringIO(response.text), sep=";")
if res_df is None:
res_df = tmp_df
else:
res_df = pd.concat([res_df, tmp_df])
I understand that is because read_csv infers a header, in this case 0. By adding keywords.values the program only looks at the values in the DataFrame.
I have a dataframe given below
I want to extract all the non-zero values from each column to put it in a summarize way like this
If any value repeated for period of time then starting time of value should go in 'FROM' column and end time of value should go in 'TO' column with column name in 'BLK-ASB-INV' column and value should go in 'Scount' column. For this I have started to write the code like this
import pandas as pd
df = pd.read_excel("StringFault_Bagewadi_16-01-2020.xlsx")
df = df.set_index(['Date (+05:30)'])
cols=['BLK-ASB-INV', 'Scount', 'FROM', 'TO']
res=pd.DataFrame(columns=cols)
for col in df.columns:
ss=df[col].iloc[df[col].to_numpy().nonzero()[0]]
.......
After that I am unable to think how should I approach to get the desired output. Is there any way to do this in python? Thanks in advance for any help.
Finally I have solved my problem, I have written the code given below works perfectly for me.
import pandas as pd
df = pd.read_excel("StringFault.xlsx")
df = df.set_index(['Date (+05:30)'])
cols=['BLK-ASB-INV', 'Scount', 'FROM', 'TO']
res=pd.DataFrame(columns=cols)
for col in df.columns:
device = []
for i in range(len(df[col])):
if df[col][i] == 0:
None
else:
if i < len(df[col])-1 and df[col][i]==df[col][i+1]:
try:
if df[col].index[i] > device[2]:
continue
except IndexError:
device.append(df[col].name)
device.append(df[col][i])
device.append(df[col].index[i])
continue
else:
if len(device)==3:
device.append(df[col].index[i])
res = res.append({'BLK-ASB-INV':device[0], 'Scount':device[1], 'FROM':device[2], 'TO': device[3]}, ignore_index=True)
device=[]
else:
device.append(df[col].name)
device.append(df[col][i])
if i == 0:
device.append(df[col].index[i])
else:
device.append(df[col].index[i-1])
device.append(df[col].index[i])
res = res.append({'BLK-ASB-INV':device[0], 'Scount':device[1], 'FROM':device[2], 'TO': device[3]}, ignore_index=True)
device=[]
For reference, here is the output datafarme
I have a requirement that the result value should be a string. But when I calculate the maximum value of dataframe it gives the result as a list.
import pandas as pd
def answer_one():
df_copy = [df['# Summer'].idxmax()]
return (df_copy)
df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)
for col in df.columns:
if col[:2]=='01':
df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
if col[:2]=='02':
df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
if col[:2]=='03':
df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
if col[:1]=='№':
df.rename(columns={col:'#'+col[1:]}, inplace=True)
names_ids = df.index.str.split('\s\(')
df.index = names_ids.str[0] # the [0] element is the country name (new index)
df['ID'] = names_ids.str[1].str[:3] # the [1] element is the abbreviation or ID (take first 3 characters from that)
df = df.drop('Totals')
df.head()
answer_one()
But here the answer_one() will give me a List as an output and not a string. Can someone help me know how this came be converted to a string or how can I get the answer directly from dataframe as a string. I don't want to convert the list to a string using str(df_copy).
Your first solution would be as #juanpa.arrivillaga put it: To not wrap it. Your function becomes:
def answer_one():
df_copy = df['# Summer'].idxmax()
return (df_copy)
>>> 1
Another thing that you might not be expecting but idxmax() will return the index of the max, perhaps you want to do:
def answer_one():
df_copy = df['# Summer'].max()
return (df_copy)
>>> 30
Since you don't want to do str(df_copy) you can do df_copy.astype(str) instead.
Here is how I would write your function:
def get_max_as_string(data, column_name):
""" Return Max Value from a column as a string."""
return data[column_name].max().astype(str)
get_max_as_string(df, '# Summer')
>>> '30'