Getting an error when calculating Z score - python-3.x

I am trying to find the outliers in my dataset and remove them. So I did the following:
z_scores = stats.zscore(dataset_sex)
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
new_df = dataset_sex[filtered_entries]
new_df.head()
but I got this error:
TypeError: unsupported operand type(s) for /: 'str' and 'int'
The error seems to generate from the first line of code (z_scores = stats.zscore(dataset_sex)). I don't understand why. How can I fix this?

This comes from some of your data in the columns being strings (in python terms 'str').
When it comes from working out the z-score, it will have to divide the mean with a standard deviation. One of the columns is a string like 'M' or 'F' for sex, or strings like '1,232.23' not converted to floats, and z-scoring does not work for that.
My first suggestion is to check that they are all numbers.
df.dtypes
will show you what types they are and then convert them to numeric.
Post a little of the data (a couple of rows) and we can help you.

Related

AttributeError:Float' object has no attribute log /TypeError: ufunc 'log' not supported for the input types

I have a series of fluorescence intensity data in a column ('2.4M'). I tried to create a new column 'ln_2.4M' by taking the ln of column '2.4M' I got an error:
AttributeError: 'float' object has no attribute 'log'
df["ln_2.4M"] = np.log(df["2.4M"])
I tried using a for loop to iterate the log over each fluorescence data in the column "2.4M":
ln2_4M = []
for x in df["2.4M"]:
ln2_4M = np.log(x)
print(ln2_4M)
Although it printed out ln2_4M as log of column "2.4M" correctly, I am unable to use the data because it gave alongside a TypeError:
ufunc 'log' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'
Not sure why? - Any help at understanding what is happening and how to fix this problem is appreciated. Thanks
.
I then tried using the method below and it worked:
df["2.4M"] = pd.to_numeric(df["2.4M"],errors = 'coerce')
df["ln_24M"] = np.log(df["2.4M"])

Python: how can I get the mode from a month column that i extracted from a datetime column?

I'm new at this! Doing my first Python project. :)
My tasks are:
convert df['Start Time'] from string to datetime
create a month column from df['Start Time']
get the mode of that month.
I used a few different ways to do all 3 of the steps, but trying to get the mode always returns TypeError: tuple indices must be integers or slices, not str. This happens even if I try converting the "tuple" into a list or NumPy array.
Ways I tried to extract month from Start Time:
df['extracted_month'] = pd.DatetimeIndex(df['Start Time']).month
df['extracted_month'] = np.asarray(df['extracted_month'])
df['extracted_month'] = df['Start Time'].dt.month
Ways I've tried to get the mode:
print(df['extracted_month'].mode())
print(df['extracted_month'].mode()[0])
print(stat.mode(df['extracted_month']))
Trying to get the index with df.columns.get_loc("extracted_month") then replacing it in the mode code gives me the SAME error (TypeError: tuple indices must be integers or slices, not str).
I think I should convert df['extracted_month'] into a different... something. What is it?
Note: My extracted_month column is a STRING, but you should still be able to get the mode from a string variable! I'm not changing it, that would be giving up.
Edit: using the following code still results in the same error
extracted_month = pd.Index(df['extracted_month'])
print(extracted_month.value_counts())
The error is likely caused by the way you are creating your dataframe.
If the dataframe is created in another function, and that function returns other things along with the dataframe, but you assign it to the variable df, then df will be a tuple that contains the actual dataframe, and not the dataframe itself.

Convert All Items in a Dataframe to Float

I am trying to convert all items in my dataframe to a float. The types are varies at the moment. The following error persist -> ValueError: could not convert string to float: '116,584.54'
The file can be found at https://www.imf.org/external/pubs/ft/weo/2019/01/weodata/WEOApr2019all.xls
I checked the value in excel, it is a Number. I tried .replace, .astype, pd.to_numeric.
for i in weo['1980']:
if i == float:
print(i)
i.replace(",",'')
i.replace("--",np.nan)
else:
continue
Also, I have tried:
weo['1980'] = weo['1980'].apply(pd.to_numeric)
You can try using DataFrame.astype in order to conduct the conversion which is usually the recommended approach. As you already attempted in your question, you may have to remove all the comas form the string in column 1980 first as it may cause the same error as quoted in your question:
weo['1980'] = weo['1980'].replace(',', '')
weo['1980'] = weo['1980'].asytpe(float)
If you're reading your DataFrame from Excel using pandas.read_excel, you can also specify the thousands argument to do this conversion for you which will likely result in a higher performance:
pandas.read_excel(file, thousands=',')
I had types error all the time while playing with dataframes. I now always use this to convert all the values that can be converted into floats.
# Convert all columns that can be converted into float into float.
# Error were raised because their type was Object
df = df.apply(pd.to_numeric, errors='ignore')

Convert Matlab Datenumb into python datetime

I have a DF that looks like this (it is matlab data):
datesAvail date
0 737272 737272
1 737273 737273
2 737274 737274
3 737275 737275
4 737278 737278
5 737279 737279
6 737280 737280
7 737281 737281
Reading on internet, i wanted to convert matlab datetime into python date using the following solution found here
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
where matlab_datenum is in my case equal to DF['date'] or DF['datesAvail']
I get an error TypeError: cannot convert the series to <class 'int'>
note that the data type is int
Out[102]:
datesAvail int64
date int64
dtype: object
I am not sure where i am going wrong. Any help is very appreciated
I am not sure what you are expecting as an output from this, but I assume it is a list?
The error is telling you exactly what is wrong, you are trying to convert a series with int(). The only arguments int can accept are strings, a bytes-like objects or numbers.
When you call DF['date'] it is giving you a series, so this needs to be converted into a number(or string or byte) first, so you need a for loop to iterate over the whole series. I would change it to a list first by doing DF['date'].tolist()
If you are looking to have an output as a list, you can do a list comprehension as shown here(sorry, this is long);
python_datetime_list = [datetime.fromordinal(int(i)) + timedelta(days=i%1) - timedelta(days = 366) for i in DF['date'].tolist()]

Python Math - Floating with financial output comes out incorrect

In a small portion of code for a retail auditing calculator, I'm attempting to allow the input of a retail value and multiply it by up to 2 entered quantities The expected (intended) result is $X*Y=$Z.
I've attempted to modify the code a couple of says and seem to be stuck on how this math is (isn't) working correctly.
I've attempted a number of different configurations in the code and the most I've achieved is the following:
#Retail value of item, whole number (i.e. $49.99 entered as 4999)
rtlVAL = input("Retail Value: ")
#Quantity of Items - can be multiplied for full stack items, default if no entry is '1'
qt1 = float(input("Quantity 1: ")) #ex. 4
qt2 = float(input("Quantity 2: ") or "1") #ex " "
#Convert the Retail Value to finacial format (i.e 4999 to $49.99)
rtl = float("{:.2}".format (rtlVAL))
# Screen Output
qtyVAL = int(qt1)*int(qt2)
print("$" + str(qtyVAL*rtl))
The entered values are:
Retail Value: 4999
Quantity 1: 4
Quantity 2: (blank)
The expected performance is 4999 * 4 * (because no entry defaults to value of 1) and the expected result is $199.96
The result of this code is $196.0, so not only is it the wrong conclusion but it's missing the two decimal places.
I'm not entirely certain why the math comes up wrong in context to expectation.
What am I missing here?
On line 9, I've tried the following:
rtl = float("{:.2f}".format (rtlVAL))
rtl = int("{:.2f}".format (rtlVAL))
The return was
ValueError: Unknown format code 'f' for object of type 'str'
if I change line 13 to:
print("$" + float(qtyVAL*rtl))
I get
TypeError: must be str, not float
using either of the prior alterations in conjunction with the latter will return the ValueError:
Python 3.4 and 3.6
I did search a few other SO questions regarding Python, Math, Floating point, and formatting but the questions were looking for and presenting something far more advances and entangled than this so i wasn't able to glean an answer to make a contextual application or it applied mainly to Python 2.7 wherein some of the code such as raw input() is simply input() and altered by int(input())in Python 3.x to step out of str value (as far as I understand for this purpose.
I did not see this as a duplicate, but if I missed something in that I do apologize - it isn't intentional.
No need to mess around with number formats:
rtl = float(rtlVAL)/100
Just divide the retail value by 100 to get the dollar value
EDIT:
Incidentally, the reason it was coming up with 196 was because your number format was taking the first two digits of rtlVAL - 49 in your case - and then multiplying by that.

Resources