nunique() not producing correct output in aggregate functions - python-3.x

I am using a aggregation for following data frame;
df = pd.DataFrame({'col1':['team1','team1','team2','team3'],
'col2':[23, 4, 5 ,6],
'col3':['user1','user1','user2','user2']})
gb = df.groupby('col1')
gb.agg({'col2':np.sum,
'col3':nunique()})
But it seems nunique() is not compatible with groupby. Please see following output.
NameError: name 'nunique' is not defined
May I know how can we use unique() for this example.Help is appreciated.
Using Numpy
gb = df.groupby('col1')
gb.agg({'col2':np.sum,
'col3':np.nunique()})
Gives a new error, AttributeError: module 'numpy' has no attribute 'nunique'

You need to use
gb.agg({'col2':np.sum, 'col3':lambda x: len(np.unique(x))})

Related

AttributeError:Float' object has no attribute log /TypeError: ufunc 'log' not supported for the input types

I have a series of fluorescence intensity data in a column ('2.4M'). I tried to create a new column 'ln_2.4M' by taking the ln of column '2.4M' I got an error:
AttributeError: 'float' object has no attribute 'log'
df["ln_2.4M"] = np.log(df["2.4M"])
I tried using a for loop to iterate the log over each fluorescence data in the column "2.4M":
ln2_4M = []
for x in df["2.4M"]:
ln2_4M = np.log(x)
print(ln2_4M)
Although it printed out ln2_4M as log of column "2.4M" correctly, I am unable to use the data because it gave alongside a TypeError:
ufunc 'log' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'
Not sure why? - Any help at understanding what is happening and how to fix this problem is appreciated. Thanks
.
I then tried using the method below and it worked:
df["2.4M"] = pd.to_numeric(df["2.4M"],errors = 'coerce')
df["ln_24M"] = np.log(df["2.4M"])

AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'

I am trying to split dataframe in equal samples and applying some function to calculate value of each sample if any sample value greater than 0.3 then in result dataframe i want to save filename
df=pd.DataFrame({'Value':[-0.016,-0.006,0.003,-0.011,-0.036,-0.031,-0.014,-0.006,-0.01 ,-0.009,0.004,0.001,-0.012,-0.021,-0.008,0.001,-0.011,-0.01,-0.006,0.002,0.004],'Nmae':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]})
x=pd.DataFrame([x.values.sqrt(np.mean(df2['Value']**2)) for x in np.array_split(df2, (len(df2)/10))])
getting this error
AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'
if someone have any other effective way to do this task
This is a working version of your Code:
res= [np.sqrt(np.mean((x.Value**2))) for x in np.array_split(df, (len(df)/10))]
An alternative way of approaching this with Pandas would be. You define a new column 'Split_variable' and use it to apply your calculations:
df.groupby('Split_variable')['Value'].apply(lambda x: np.sqrt(np.mean((x**2))))

AttributeError: 'numpy.ndarray' object has no attribute 'rolling'

When I am trying to do MA or rolling average with log transformed data I get this error. Where am I going wrong?
This one with original data worked fine-
# Rolling statistics
rolmean = data.rolling(window=120).mean()
rolSTD = data.rolling(window=120).std()
with log transformed data-
MA = X.rolling(window=120).mean()
MSTD = X.rolling(window=120).std()
AttributeError: 'numpy.ndarray' object has no attribute 'rolling'
You have to convert the numpy array to a pandas dataframe to use the pandas.rolling method.
The change could be something like this
dataframe = pd.DataFrame(data)
rolmean = dataframe.rolling(120).mean()
Try this instead:
numpy.roll(your_array, shift, axis = None)
There is no attribute rolling in numpy. So you shoud use the above syntax
Hope this helps

Find Max Value in a field of a shapefile

I have a shapefile (mich_co.shp) which I try to find the county with max population. My idea is to use max() function it's not possible. Here is my code so far:
from osgeo import ogr
import os
shapefile = "C:/Users/root/Python/mich_co.shp"
driver = ogr.GetDriverByName("ESRI Shapefile")
dataSource = driver.Open(shapefile, 0)
layer = dataSource.GetLayer()
for feature in layer:
print(feature.GetField("pop"))
layer.ResetReading()
The code above however only print all values of "pop" field like this:
10635.0
9541.0
112039.0
29234.0
23406.0
15477.0
8683.0
58990.0
106935.0
17465.0
156067.0
43868.0
135099.0
I tried:
print(max(feature.GetField("pop")))
but it returns TypeError: 'float' object is not iterable. For this, I've also tried:
for feature in range(layer):
and it returns TypeError: 'Layer' object cannot be interpreted as an integer.
Any helps of hints would be much appreciated.
Thanks you!
max() needs an iterable, such as a list. Try to build a list:
pops = [ feature.GetField("pop") for feature in layer ]
print(max(pops))

How to find compatible alternative to string.atol in python3

I am trying to make my python2.x-Code compatible with both 2.7 and 3.x. Currently I am stuck at some Code in Pmw.py (from python megawidgets). Have a a look at the first three entries of this dictionary:
_standardValidators = {
'numeric' : (numericvalidator, string.atol),
'integer' : (integervalidator, string.atol),
'hexadecimal' : (hexadecimalvalidator, lambda s: string.atol(s, 16)),
'real' : (realvalidator, Pmw.stringtoreal),
'alphabetic' : (alphabeticvalidator, len),
'alphanumeric' : (alphanumericvalidator, len),
'time' : (timevalidator, Pmw.timestringtoseconds),
'date' : (datevalidator, Pmw.datestringtojdn),
}
The first two entries contain "string.atol". My questions are:
In the python docs atol is introduced as a function ( string.atol(s[, base]) ) , so there should be parentheses, which are missing here. So how is this syntax to be understood?
In python 3.2 this code raises the error:
'numeric' : (numericvalidator, string.atol),
AttributeError: 'module' object has no attribute 'atol'
I already tried replacing the three occurences of "atol" with long, like suggested in the python docs, but that just raised the error:
'numeric' : (numericvalidator, string.long),
AttributeError: 'module' object has no attribute 'long'
As I don't even understand the syntax, I'm quite helpless about what to try next. How is this code to be fixed, so that it works both in python 2.7 and 3.x?
Hope you can help me on that one.
1: string.atol is the function itself: functions are first-class objects in python. The parentheses are only used for a call.
>>> import string
>>> string.atol
<function atol at 0x00B29AB0>
>>> string.atol("aab2", 16)
43698L
2: I think you must have misread. long doesn't live in string, but there isn't a long in Python 3 anyway. That's a relic of when Python distinguished between small integers and long integers in ways that could be seen from userspace. (That's what the "L" on the end of 43698L above means.)
Simply use int, i.e.
'numeric': (numericvalidator, int),
When the parenthesis are missing, you're assigning the function itself, rather than the results of a function call.
Try replacing string.atol with int.

Resources