accounts = pd.read_csv('C:/*******/New_export.txt', sep=",", dtype={'number': object})
accounts.columns = ["Number", "F"]
for i, j in accounts["Number"].iterrows(): #i represents the row(index number), j is the number
if (str(j) == "27*******5"):
print(accounts["F"][i], accounts["Number"][i])
I get the following error:
AttributeError: 'Series' object has no attribute 'iterrows'
I don't quite understand the error since "accounts" is a pandas dataframe. Please assist.
accounts["Number"] is a Series object, not a DataFrame. Either iterate over accounts.iterrows() and take the Number column from each row, or use the Series.iteritems() method.
Iterating over the dataframe:
for i, row in accounts.iterrows():
if str(row['Number']) == "27*******5":
print(row["F"], row["Number"])
or over Series.iteritems():
for i, number in accounts['Number'].iteritems():
if str(number) == "27*******5":
print(accounts["F"][i], number)
Related
Can someone please explain to me, why when I write like this everything works fine:
def create_geo_hash(lat, lon):
latitude = float(lat)
longitude = float(lon)
geo_hash = pygeohash.encode(latitude, longitude, precision=4)
return geo_hash
def fill_with_valid_coords(df: DataFrame) -> DataFrame:
validated_rdd = df.rdd.map(lambda row: check_for_invalid_coords(row))
geo_hash_rdd = validated_rdd.map(lambda row: (create_geo_hash(row[5], row[6]), ))
geo_hash_df = geo_hash_rdd.toDF(schema=['Geo_Hash'])
return geo_hash_df
But when I pass the entire row to the mapping function like this:
geo_hash_rdd = validated_rdd.map(lambda row: create_geo_hash(row))
and change my create_geo_hash function accordingly:
def create_geo_hash(row):
latitude = float(row.Latitude)
longitude = float(row.Longitude)
geo_hash = pygeohash.encode(latitude, longitude, precision=4)
return geo_hash
I am having AttributeError: 'tuple' object has no attribute 'Latitude'
When I pass entire row to the validated_rdd = df.rdd.map(lambda row: check_for_invalid_coords(row)) and then use it in another function as a row it works fine.
Trying to add a number (type float) from one of two columns in a pd dataframe using the following code:
"""
creating dict of {symbol:[spot, aggregate]
"""
abn_dict = defaultdict(lambda: [0, 0])
for (col, row) in abn_df.iterrows():
try:
row.loc["Quantity Long"].isnull()
abn_dict[row.loc["Symbol"]][1] += row.loc["Quantity Short"]
except AttributeError:
abn_dict[row.loc["Symbol"]][1] += row.loc["Quantity Long"]
If the quantity long column is NaN, it should add the quantity short to the second element in the abn_dict values.
This is however not working with the above code and wanted to ask why.
As it is, you have no condition in your code. Also, as per documentation, pandas.DataFrame.iterrows() returns (index, row), note (col, row).
Try refactoring like this:
for _, row in abn_df.iterrows():
if row.loc["Quantity Long"]:
abn_dict[row.loc["Symbol"]][1] += row.loc["Quantity Short"]
else:
abn_dict[row.loc["Symbol"]][1] += row.loc["Quantity Long"]
i am iterating a nested dictionary using dict.keys() method. code works well if the dictionary is nested however if the dictionary is not nested , it throws an error (i.e)
{"a":{1:'i'}}
for above dictionary the code works fine but following dictionary it fails
{"a":1}
In my iteration , I wish to not throw error if the dictionary is not having further keys. per requirement we may pass nested or non-nested dictionaries.
Following is the sample code:
global n
n=0
df = pd.DataFrame(index = np.arange(10), columns = ['column0'])
def iterate_dict(dict):
global n
for j in dict.keys()
df[n] = j
n = n+1
return dict
#function call
iterate_dict({"a":1})
Error Message:
AttributeError: 'str' object has no attribute 'keys'
Thanks for the help.
You must have tried calling iterate_dict({"a":1}) with string argument like iterate_dict("{a:1}") which gives error AttributeError: 'str' object has no attribute 'keys'.
Try using:
returned_dict = iterate_dict({"a": 1})
It should work.
Adding working code here:
import pandas as pd
import numpy as np
global n
n = 0
df = pd.DataFrame(index=np.arange(10), columns=['column0'])
def iterate_dict(dict):
global n
for j in dict.keys():
df[n] = j
n = n + 1
return dict
# function call
iterate_dict({"a": 1})
print(df.head())
OUTPUT
column0 0
0 NaN a
1 NaN a
2 NaN a
3 NaN a
4 NaN a
I have categorical variables based on states.
I want to create a dynamic dataframe with same name and data of filtered state only.
Like for DataAL, we will have all data of AL states only.
Code 1:
l = []
for i in a:
print(i)
l[i] = df4[df4["Provider State"].str.contains(i)]
l[i] = pd.DataFrame(l[i])
l[i].head()
TypeError: list indices must be integers or slices, not str
Code 2:
l = []
for i in range(len(a)):
print(i)
l[i] = df4[df4["Provider State"].str.contains(a[i])]
l[i] = pd.DataFrame(l[i])
l[i].head()
IndexError: list assignment index out of range
I'm new to python, and I've found this community to be quite helpful so far. I've found a lot of answers to my other questions, but I can't seem to figure this one out.
I'm trying to write a function to loop through columns and replace '%', '$', and ','. When I import the .csv in through pandas I have about 80/108 columns that are dtype == object that I need to convert to float.
I've found I can write:
df['column_name'] = df['column_name].str.replace('%', '')
and it successfully executes and strips the %.
Unfortunately I have a lot of columns(108) and want to write a function to take care of the problem. I have come up with the below code that will only execute on some of the columns and puts out an odd error:
# get column names
col_names = list(df.columns.values)
# start cleaning data
def clean_data(x):
for i in range(11, 109, 1):
if x[col_names[i]].dtype == object:
x[col_names[i]] = x[col_names[i]].str.replace('%', '')
x[col_names[i]] = x[col_names[i]].str.replace('$', '')
x[col_names[i]] = x[col_names[i]].str.replace(',', '')
AttributeError: 'DataFrame' object has no attribute 'dtype'
Even though the error stops the process, some of the columns are cleaned up. I can't seem to figure out why it's not cleaning up all columns and then returns the 'dtype' error.
I'm running python 3.6.
Welcome to stackoverflow.
If you want to do this for each columns, use the apply function of the dataframe, no need to loop:
df = pd.DataFrame([['1$', '2%'],] * 3, columns=['A', 'B'])
def myreplace(s):
for ch in ['%','$',',']:
s = s.map(lambda x: x.replace(ch, ''))
return s
df = df.apply(myreplace)
print(df)
If you want to do it for some columns, use the map function of the dataserie, no need to loop:
df = pd.DataFrame([['1$', '2%'],] * 3, columns=['A', 'B'])
def myreplace(s):
for ch in ['%','$',',']:
s = s.replace(ch, '')
return s
df['A'] = df['A'].map(myreplace)