I am trying to split dataframe in equal samples and applying some function to calculate value of each sample if any sample value greater than 0.3 then in result dataframe i want to save filename
df=pd.DataFrame({'Value':[-0.016,-0.006,0.003,-0.011,-0.036,-0.031,-0.014,-0.006,-0.01 ,-0.009,0.004,0.001,-0.012,-0.021,-0.008,0.001,-0.011,-0.01,-0.006,0.002,0.004],'Nmae':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]})
x=pd.DataFrame([x.values.sqrt(np.mean(df2['Value']**2)) for x in np.array_split(df2, (len(df2)/10))])
getting this error
AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'
if someone have any other effective way to do this task
This is a working version of your Code:
res= [np.sqrt(np.mean((x.Value**2))) for x in np.array_split(df, (len(df)/10))]
An alternative way of approaching this with Pandas would be. You define a new column 'Split_variable' and use it to apply your calculations:
df.groupby('Split_variable')['Value'].apply(lambda x: np.sqrt(np.mean((x**2))))
Related
My data file (CSV) contains categorical and non-categorical variables. To perform cox proportional hazard (CPH) I applied OneHotEncoder on two categorical variables (study_category and patient_category). I am getting the following error on the line where I am trying to fit the CPH model. I am passing three parameters: dataframe, duration column (), event column() to cph.fit() method. I googled the error but could not found something useful. I am using CPH first time, any help to fix the issue will be appreciated.
Error:
AttributeError: 'ColumnTransformer' object has no attribute 'shape'
My python code:
def meth():
dataset = pd.read_csv("C:/Users/XYZ/CTR_Project/CPH.csv")
dataset=dataset.loc[:,
['study_Category','patient_Category','Diff_time','Events']]
X=dataset.loc[:,['study_Category','patient_Category','Diff_time','Events']]
colm_transf=make_column_transformer((OneHotEncoder(),
['study_Category','patient_Category']),remainder='passthrough')
colm_transf.fit_transform(X)
cph= CoxPHFitter()
cph.fit(colm_transf,duration_col='Diff_time', event_col='Events')
cph.print_summary()
I have a series of fluorescence intensity data in a column ('2.4M'). I tried to create a new column 'ln_2.4M' by taking the ln of column '2.4M' I got an error:
AttributeError: 'float' object has no attribute 'log'
df["ln_2.4M"] = np.log(df["2.4M"])
I tried using a for loop to iterate the log over each fluorescence data in the column "2.4M":
ln2_4M = []
for x in df["2.4M"]:
ln2_4M = np.log(x)
print(ln2_4M)
Although it printed out ln2_4M as log of column "2.4M" correctly, I am unable to use the data because it gave alongside a TypeError:
ufunc 'log' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'
Not sure why? - Any help at understanding what is happening and how to fix this problem is appreciated. Thanks
.
I then tried using the method below and it worked:
df["2.4M"] = pd.to_numeric(df["2.4M"],errors = 'coerce')
df["ln_24M"] = np.log(df["2.4M"])
In my variable 'Datelist3' there is a pandas Timestamp list, in the following format:
[Timestamp('2019-12-04 09:00:00+0100', tz='Europe/Rome'), Timestamp('2019-12-04 09:30:00+0100', tz='Europe/Rome'), ....]
I'm having difficulty converting this list to a datetime string list, in this format:
['2019-12-04 09:00:00', '2019-12-04 09:30:00', .....]
I did these tests:
Datelist3.to_datetime # -> error: 'list' object has no attribute 'to_datetime'
Datelist3.dt.to_datetime # -> error: 'list' object has no attribute 'dt'
Datelist3.to_pydatetime() # -> error: 'list' object has no attribute 'to_pydatetime()'
Datelist3.dt.to_pydatetime() # -> error: 'list' object has no attribute 'dt'
I got to the variable 'Datelist3' with the following statement:
Datelist3 = quoteIntraPlot.index.tolist()
If I this instruction changes it to:
Datelist3 = quoteIntraPlot.index.strftime("%Y-%m-%d %H:%M:%S").tolist()
That's exactly what I want to achieve.
The problem is that out of 10 times, 6-7 times is ok and 3-4 times it gives me an error: " 'Index' object has no 'strftime' ". It's very strange. How could I solve this problem?
If your data is well formed, this would work :
time_list = [Timestamp('2019-12-04 09:00:00+0100', tz='Europe/Rome'), Timestamp('2019-12-04 09:30:00+0100', tz='Europe/Rome'), ....]
str_list = [t.strftime("%Y-%m-%d %H:%M:%S") for t in time_list]
However, if you face the same error as before, it means that not all your index are timestamps. In this case, you need to clean your data first.
When I am trying to do MA or rolling average with log transformed data I get this error. Where am I going wrong?
This one with original data worked fine-
# Rolling statistics
rolmean = data.rolling(window=120).mean()
rolSTD = data.rolling(window=120).std()
with log transformed data-
MA = X.rolling(window=120).mean()
MSTD = X.rolling(window=120).std()
AttributeError: 'numpy.ndarray' object has no attribute 'rolling'
You have to convert the numpy array to a pandas dataframe to use the pandas.rolling method.
The change could be something like this
dataframe = pd.DataFrame(data)
rolmean = dataframe.rolling(120).mean()
Try this instead:
numpy.roll(your_array, shift, axis = None)
There is no attribute rolling in numpy. So you shoud use the above syntax
Hope this helps
I have a shapefile (mich_co.shp) which I try to find the county with max population. My idea is to use max() function it's not possible. Here is my code so far:
from osgeo import ogr
import os
shapefile = "C:/Users/root/Python/mich_co.shp"
driver = ogr.GetDriverByName("ESRI Shapefile")
dataSource = driver.Open(shapefile, 0)
layer = dataSource.GetLayer()
for feature in layer:
print(feature.GetField("pop"))
layer.ResetReading()
The code above however only print all values of "pop" field like this:
10635.0
9541.0
112039.0
29234.0
23406.0
15477.0
8683.0
58990.0
106935.0
17465.0
156067.0
43868.0
135099.0
I tried:
print(max(feature.GetField("pop")))
but it returns TypeError: 'float' object is not iterable. For this, I've also tried:
for feature in range(layer):
and it returns TypeError: 'Layer' object cannot be interpreted as an integer.
Any helps of hints would be much appreciated.
Thanks you!
max() needs an iterable, such as a list. Try to build a list:
pops = [ feature.GetField("pop") for feature in layer ]
print(max(pops))