Convert Matlab Datenumb into python datetime - python-3.x

I have a DF that looks like this (it is matlab data):
datesAvail date
0 737272 737272
1 737273 737273
2 737274 737274
3 737275 737275
4 737278 737278
5 737279 737279
6 737280 737280
7 737281 737281
Reading on internet, i wanted to convert matlab datetime into python date using the following solution found here
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
where matlab_datenum is in my case equal to DF['date'] or DF['datesAvail']
I get an error TypeError: cannot convert the series to <class 'int'>
note that the data type is int
Out[102]:
datesAvail int64
date int64
dtype: object
I am not sure where i am going wrong. Any help is very appreciated

I am not sure what you are expecting as an output from this, but I assume it is a list?
The error is telling you exactly what is wrong, you are trying to convert a series with int(). The only arguments int can accept are strings, a bytes-like objects or numbers.
When you call DF['date'] it is giving you a series, so this needs to be converted into a number(or string or byte) first, so you need a for loop to iterate over the whole series. I would change it to a list first by doing DF['date'].tolist()
If you are looking to have an output as a list, you can do a list comprehension as shown here(sorry, this is long);
python_datetime_list = [datetime.fromordinal(int(i)) + timedelta(days=i%1) - timedelta(days = 366) for i in DF['date'].tolist()]

Related

Getting an error when calculating Z score

I am trying to find the outliers in my dataset and remove them. So I did the following:
z_scores = stats.zscore(dataset_sex)
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
new_df = dataset_sex[filtered_entries]
new_df.head()
but I got this error:
TypeError: unsupported operand type(s) for /: 'str' and 'int'
The error seems to generate from the first line of code (z_scores = stats.zscore(dataset_sex)). I don't understand why. How can I fix this?
This comes from some of your data in the columns being strings (in python terms 'str').
When it comes from working out the z-score, it will have to divide the mean with a standard deviation. One of the columns is a string like 'M' or 'F' for sex, or strings like '1,232.23' not converted to floats, and z-scoring does not work for that.
My first suggestion is to check that they are all numbers.
df.dtypes
will show you what types they are and then convert them to numeric.
Post a little of the data (a couple of rows) and we can help you.

Python: how can I get the mode from a month column that i extracted from a datetime column?

I'm new at this! Doing my first Python project. :)
My tasks are:
convert df['Start Time'] from string to datetime
create a month column from df['Start Time']
get the mode of that month.
I used a few different ways to do all 3 of the steps, but trying to get the mode always returns TypeError: tuple indices must be integers or slices, not str. This happens even if I try converting the "tuple" into a list or NumPy array.
Ways I tried to extract month from Start Time:
df['extracted_month'] = pd.DatetimeIndex(df['Start Time']).month
df['extracted_month'] = np.asarray(df['extracted_month'])
df['extracted_month'] = df['Start Time'].dt.month
Ways I've tried to get the mode:
print(df['extracted_month'].mode())
print(df['extracted_month'].mode()[0])
print(stat.mode(df['extracted_month']))
Trying to get the index with df.columns.get_loc("extracted_month") then replacing it in the mode code gives me the SAME error (TypeError: tuple indices must be integers or slices, not str).
I think I should convert df['extracted_month'] into a different... something. What is it?
Note: My extracted_month column is a STRING, but you should still be able to get the mode from a string variable! I'm not changing it, that would be giving up.
Edit: using the following code still results in the same error
extracted_month = pd.Index(df['extracted_month'])
print(extracted_month.value_counts())
The error is likely caused by the way you are creating your dataframe.
If the dataframe is created in another function, and that function returns other things along with the dataframe, but you assign it to the variable df, then df will be a tuple that contains the actual dataframe, and not the dataframe itself.

Concerting duration (hh:mm:ss) to integer return invalid literal for int() with base error

I'm writing a python script to send some data to BigQuery. I have a duration column in my dataframe with the following format hh:mm:ss
As far as I can see type is non-null object and I need to convert it to integer to send it to BigQuery.
I wrote the following line:
as_run_df['duration'] = as_run_df['duration'].astype(int)
which returns me the following error:
invalid literal for int() with base 10: '00:58:29'
What should I do?
You can convert values to timedeltas by to_timedelta and then get seconds by Series.dt.total_seconds:
as_run_df = pd.DataFrame({'duration':['00:58:29','00:58:30']})
#column is filled by strings
as_run_df['duration'] = pd.to_timedelta(as_run_df['duration']).dt.total_seconds().astype(int)
print (as_run_df)
duration
0 3509
1 3510
#columns is filled ty times objects
as_run_df['duration'] = (pd.to_timedelta(as_run_df['duration'].astype())
.dt.total_seconds().astype(int))

Convert All Items in a Dataframe to Float

I am trying to convert all items in my dataframe to a float. The types are varies at the moment. The following error persist -> ValueError: could not convert string to float: '116,584.54'
The file can be found at https://www.imf.org/external/pubs/ft/weo/2019/01/weodata/WEOApr2019all.xls
I checked the value in excel, it is a Number. I tried .replace, .astype, pd.to_numeric.
for i in weo['1980']:
if i == float:
print(i)
i.replace(",",'')
i.replace("--",np.nan)
else:
continue
Also, I have tried:
weo['1980'] = weo['1980'].apply(pd.to_numeric)
You can try using DataFrame.astype in order to conduct the conversion which is usually the recommended approach. As you already attempted in your question, you may have to remove all the comas form the string in column 1980 first as it may cause the same error as quoted in your question:
weo['1980'] = weo['1980'].replace(',', '')
weo['1980'] = weo['1980'].asytpe(float)
If you're reading your DataFrame from Excel using pandas.read_excel, you can also specify the thousands argument to do this conversion for you which will likely result in a higher performance:
pandas.read_excel(file, thousands=',')
I had types error all the time while playing with dataframes. I now always use this to convert all the values that can be converted into floats.
# Convert all columns that can be converted into float into float.
# Error were raised because their type was Object
df = df.apply(pd.to_numeric, errors='ignore')

How to convert scientic notation of my code?

I've created a function to check the range of values in a pandas dataframe. But the output is producing all values with scientific notation.
When I select_dtypes to include only int, I don't get this problem. It only happens when I include float. How can I get non-scientific values?
# Function to check range of value in a col.
def value_range(col):
max = data[col].max()
min = data[col].min()
return max-min
# value_range('total_revenue')
numerical_data = data.select_dtypes(include=[float, int]).columns
print(value_range(numerical_data))
Out:
Unnamed: 0 3.081290e+05
number_of_sessions 1.340000e+02
total_bounce 1.080000e+02
total_hits 3.706000e+03
days_difference_from_last_first_visit 1.800000e+02
transactions 2.500000e+01
total_revenue 2.312950e+10
organic_search 1.110000e+02
direct 8.300000e+01
referral 1.070000e+02
social 1.120000e+02
paid_search 4.400000e+01
affiliates 3.700000e+01
display 8.500000e+01
target_if_purchase 1.000000e+00
target_total_revenue 2.039400e+09
dtype: float64
value_range(numerical_data).apply(lambda x: format(x, 'f')) solves the problem. Thanks Sparrow1029.

Resources