How do I convert numpy array to days, hours, mins? - python-3.x

Running with this series
X = number_of_logons_all.values
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
I get:
mean1=60785.792548, mean2=61291.266868
variance1=7483553053.651829, variance2=7603208729.348722
But I wanted something like this in my PyCharm console (pulled from another result):
>>> -103 days +04:37:13.802435724...
Tried to place the np.array in a pd.Dataframe() to get the expected value by adding
.apply(pd.to_timedelta, unit='s')
...this didn't work, so I tried
new = pd.DataFrame([mean1]).to_numpy(dtype='timedelta64[ns]')
...and (still) got something like this:
>>>> [[63394]]
Anyone out there who could assist me converting to an easily comprehended datetime result from my means calculation above?
Thx, in advance for your kind support.

You can use f-strings:
mean1, mean2 = 60785.792548, 61291.266868
variance1, variance2=7603208729.348722,7483553053.651829
print(f'mean1={pd.Timedelta(mean1, unit="s")}, mean2={pd.Timedelta(mean2, unit="s")}')
print(f'variance1={pd.Timedelta(variance1, unit="s")}, variance2={pd.Timedelta(variance2, unit="s")}')
mean1=0 days 16:53:05.792548, mean2=0 days 17:01:31.266868
variance1=88000 days 02:25:29.348722458, variance2=86615 days 04:44:13.651828766

Related

Why is time conversion between epoch seconds and string changing time by 1 calendar year?

I am using the time module of python3 to convert time between seconds and formatted string. Python functions used to generate string are localtime and strftime. To generate the time in seconds, I use string splicing followed by mktime. As I call these repeatedly on each result, only the year changes, always incrementing the seconds by a full year.
Code used is as below:
import time
def time_string(t):
#t is second obtained by time.mktime((yr, mn, dy, hr, mn, sec, 0, 0, 0))
time_struct = time.localtime(t)
time_string = time.strftime("%Y-%m-%d %H:%M:%S", time_struct)
return time_string
def string_time(t_string):
#t_string has format '2020-01-31 08:23:35'
yr = int(t_string[:4])
mn = int(t_string[5:7])
dy = int(t_string[8:10])
hr = int(t_string[11:13])
mn = int(t_string[14:16])
se = int(t_string[17:])
t=int(time.mktime((yr, mn, dy, hr, mn, se, 0, 0, 0)))
return t
t = int(time.mktime((2020, 3, 19, 18, 15, 20, 0, 0, 0)))
print (t)
for x in range(5):
t_st = time_string(t)
print(t_st)
t = string_time(t_st)
print(t)
sys.exit("stopping..")
The results I get from above code execution is as follows:
1584621920
2020-03-19 18:15:20
1616157920
2021-03-19 18:15:20
1647693920
2022-03-19 18:15:20
1679229920
2023-03-19 18:15:20
1710852320
2024-03-19 18:15:20
1742388320
SystemExit: stopping..
What am I doing wrong? Why does this happen?
What is a better way of converting time-string to seconds?
I do not get the purpose of the question, so what you're actually trying to do, however if you have a string of a time, and you want to have the seconds of it, try using datetime.timestamp() instead of a time-string-splicing...
Your code is increasing in the year by one beacuse in your method string_time(t_string) you set the variable mn twice! One time at mn = int(t_string[5:7]) and once at mn = int(t_string[14:16]) which will result in a month of 15 which will adapt the year by 1 year and 3 month which will result in the one year for you
Found time.strptime to solve the problem of converting back from string using the right formatters. The following code eliminated the need to do string splicing
def string_time(t_string):
#t_string has format '2020-01-31 08:23:35'
t_struct = time.strptime(t_string,"%Y-%m-%d %H:%M:%S")
t = int(time.mktime(t_struct))
return t
Korbinian had already found the error in my code. Is there a reason why I should use the datetime module instead of the date module?

How to use custom mean, median, mode functions with array of 2500 in python?

So I am trying to solve mean, median and mode challenge on Hackerrank. I defined 3 functions to calculate mean, median and mode for a given array with length between 10 and 2500, inclusive.
I get an error with an array of 2500 integers, not sure why. I looked into python documentation and found no mentions of max length for lists. I know I can use statistics module but trying the hard way and being stubborn I guess. Any help and criticism is appreciated regarding my code. Please be honest and brutal if need be. Thanks
N = int(input())
var_list = [int(x) for x in input().split()]
def mean(sample_list):
mean = sum(sample_list)/N
print(mean)
return
def median(sample_list):
sorted_list = sorted(sample_list)
if N%2 != 0:
median = sorted_list[(N//2)]
else:
median = (sorted_list[N//2] + sorted_list[(N//2)-1])/2
print(median)
return
def mode(sample_list):
sorted_list = sorted(sample_list)
mode = min(sorted_list)
max_count = sorted_list.count(mode)
for i in sorted_list:
if (i <= mode) and (sorted_list.count(i) >= max_count):
mode = i
print(mode)
return
mean(var_list)
median(var_list)
mode(var_list)
Compiler Message
Wrong Answer
Input (stdin)
2500
19325 74348 68955 98497 26622 32516 97390 64601 64410 10205 5173 25044 23966 60492 71098 13852 27371 40577 74997 42548 95799 26783 51505 25284 49987 99134 33865 25198 24497 19837 53534 44961 93979 76075 57999 93564 71865 90141 5736 54600 58914 72031 78758 30015 21729 57992 35083 33079 6932 96145 73623 55226 18447 15526 41033 46267 52486 64081 3705 51675 97470 64777 31060 90341 55108 77695 16588 64492 21642 56200 48312 5279 15252 20428 57224 38086 19494 57178 49084 37239 32317 68884 98127 79085 77820 2664 37698 84039 63449 63987 20771 3946 862 1311 77463 19216 57974 73012 78016 9412 90919 40744 24322 68755 59072 57407 4026 15452 82125 91125 99024 49150 90465 62477 30556 39943 44421 68568 31056 66870 63203 43521 78523 58464 38319 30682 77207 86684 44876 81896 58623 24624 14808 73395 92533 4398 8767 72743 1999 6507 49353 81676 71188 78019 88429 68320 59395 95307 95770 32034 57015 26439 2878 40394 33748 41552 64939 49762 71841 40393 38293 48853 81628 52111 49934 74061 98537 83075 83920 42792 96943 3357 83393{-truncated-}
Download to view the full testcase
Expected Output
49921.5
49253.5
2184
Your issue seems to be that you are actually using standard list operations rather than calculating things on the fly, while looping through the data once (for the average). sum(sample_list) will almost surely give you something which exceeds the double-limit, i.a.w. it becomes really big.
Further reading
Calculating the mean, variance, skewness, and kurtosis on the fly
How do I determine the standard deviation (stddev) of a set of values?
Rolling variance algorithm
What is a good solution for calculating an average where the sum of all values exceeds a double's limits?
How do I determine the standard deviation (stddev) of a set of values?
How to efficiently compute average on the fly (moving average)?
I figured out that you forgot to change the max_count variable inside the if block. Probably that causes the wrong result. I tested the debugged version on my computer and they seem to work well when I compare their result with the scipy's built-in functions. The correct mode function should be
def mode(sample_list):
N = len(sample_list)
sorted_list = sorted(sample_list)
mode = min(sorted_list)
max_count = sorted_list.count(mode)
for i in sorted_list:
if (sorted_list.count(i) >= max_count):
mode = i
max_count = sorted_list.count(i)
print(mode)
I was busy with some stuff and now came back to completing this. I am happy to say that I have matured enough as a coder and solved this issue.
Here is the solution:
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Input an array of numbers, convert it to integer array
n = int(input())
my_array = list(map(int, input().split()))
my_array.sort()
# Find mean
array_mean = sum(my_array) / n
print(array_mean)
# Find median
if (n%2) != 0:
array_median = my_array[n//2]
else:
array_median = (my_array[n//2 - 1] + my_array[n//2]) / 2
print(array_median)
# Find mode(I could do this using multimode method of statistics module for python 3.8)
def sort_second(array):
return array[1]
modes = [[i, my_array.count(i)] for i in my_array]
modes.sort(key = sort_second, reverse=True)
array_mode = modes[0][0]
print(array_mode)

Why this code doesn't print the single index?

We have the eng_stress and eng_strain arrays taken from excel file
eng_stress = np.array(eng_stress)
eng_strain = np.array(strain_percent / 100)
eng_strain = eng_strain + 1
true_stress = np.multiply(eng_stress, eng_strain)
true_strain = np.log(eng_strain)
print(true_stress[10])
When I try to acces to a certain index, something like the following happens instead of single outcome.
[466.12834181 466.2044319 466.27916323 466.35480041 466.43043758
466.50562183 466.58125901 466.65689618 466.73208043 466.80771761
466.8838077 466.95853903 467.03508204 467.10981338 467.18545055
467.26108772 467.33627198 467.41145623 467.48709341 467.56273058
467.63882067 467.71355201 467.78918918 467.86482635 467.94001061
468.0161007 468.09083203 468.16692212 468.2425593 468.31774355
468.39292781 468.4690179 468.54374923 468.61983932 468.69502358
468.77020783 468.84629792 468.92148218 468.99666643 469.07275652
469.14794078 469.22357795 469.29966804 469.37439938 469.45048947
469.52522081 469.6013109 469.67649515 469.75167941 469.82731658
469.90295375 469.97813801 470.05377518 470.12895943 470.20459661
470.2806867 470.35541803 470.43196104 470.50669238 470.58232955
470.65796672 470.7336039 470.80833523 470.88442532 470.95960958
471.03524675 471.11088392 471.18606818 471.26170535 471.33688961
471.41252678 471.48771103 471.56380112 471.63898538 471.71507547
471.78980681 471.8658969 471.94108115 472.01626541 472.09190258
472.16753975 472.24272401 472.3188141 472.39354543 472.46918261
472.5452727 472.62000403 472.69654704 472.77127838 472.84736847
472.92209981 472.99773698 473.07337415 473.14901132 473.22419558
473.30028567 473.37501701 473.45065418 473.52629135 473.60102269
473.6775657 473.75229703 473.82884004 473.9040243 473.97920855
474.05439281 474.1304829 474.20521423 474.28130432 474.35648858
474.43167283 474.50776292 474.58294718 474.65813143 474.73376861
474.80940578 474.88459003 474.96113304 475.03586438 475.11195447
475.18668581 475.2627759 475.33796015 475.41314441 475.48878158
475.56441875 475.63960301 475.7156931 475.79087735 475.86606161
475.9421517 476.01688303 476.09342604 476.16815738 476.24379455
476.31943172 476.3950689 476.46980023 476.54589032 476.62107458
476.69716467 476.77234892 476.84753318 476.92317035 476.99835461
477.0744447 477.14962895 477.22526612 477.30045038 477.37654047
477.45127181 477.5273619 477.60209323 477.67773041 477.75336758
477.82855183 477.90464192 477.9802791 478.05501043 478.13064761
478.20628478 478.28146903 478.35801204 478.43274338 478.50883347
478.58356481 478.65920198 478.73483915 478.81002341 478.88566058
478.96175067 479.03648201 479.1125721 479.18775635 479.26294061
479.3390307 479.41376203 479.48985212 479.5650363......... 532]
Maybe eng_stress is a 2D array?
Try:
print(eng_stress.shape)
to find out the shape of the arrays you are working with :)
If your array has the shape (X,1) then it might be in the wrong direction and you could do a quick fix by changing your code to:
eng_stress = np.array(eng_stress).T[0]
eng_strain = np.array(strain_percent / 100)
eng_strain = eng_strain + 1
true_stress = np.multiply(eng_stress, eng_strain)
true_strain = np.log(eng_strain)
print(true_stress[10])
Your numpy arrays may be 2-dimensional. That's why it's printing an array rather than a value. To access a single value of column x, try print(true_stress[10][x]).
The other thing you can do is multiply two 1D numpy arrays. In that case, you'll get a single value.

pd.to_datetime to solve '2010/1/1' rather than '2010/01/01'

I have a dataframe which contain a column 'trade_dt' like this
2009/12/1
2009/12/2
2009/12/3
2009/12/4
I got this problem
benchmark['trade_dt'] = pd.to_datetime(benchmark['trade_dt'], format='%Y-&m-%d')
ValueError: time data '2009/12/1' does not match format '%Y-&m-%d' (match)
how to solve it? Thanks~
Need change format for match - replace & and - to % and /:
benchmark['trade_dt'] = pd.to_datetime(benchmark['trade_dt'], format='%Y/%m/%d')
Also working with sample data removing format (but not sure with real data):
benchmark['trade_dt'] = pd.to_datetime(benchmark['trade_dt'])
print (benchmark)
trade_dt
0 2009-12-01
1 2009-12-02
2 2009-12-03
3 2009-12-04

python - cannot make corr work

I'm struggling with getting a simple correlation done. I've tried all that was suggested under similar questions.
Here are the relevant parts of the code, the various attempts I've made and their results.
import numpy as np
import pandas as pd
try01 = data[['ESA Index_close_px', 'CCMP Index_close_px' ]].corr(method='pearson')
print (try01)
Out:
Empty DataFrame
Columns: []
Index: []
try04 = data['ESA Index_close_px'][5:50].corr(data['CCMP Index_close_px'][5:50])
print (try04)
Out:
**AttributeError: 'float' object has no attribute 'sqrt'**
using numpy
try05 = np.corrcoef(data['ESA Index_close_px'],data['CCMP Index_close_px'])
print (try05)
Out:
AttributeError: 'float' object has no attribute 'sqrt'
converting the columns to lists
ESA_Index_close_px_list = list()
start_value = 1
end_value = len (data['ESA Index_close_px']) +1
for items in data['ESA Index_close_px']:
ESA_Index_close_px_list.append(items)
start_value = start_value+1
if start_value == end_value:
break
else:
continue
CCMP_Index_close_px_list = list()
start_value = 1
end_value = len (data['CCMP Index_close_px']) +1
for items in data['CCMP Index_close_px']:
CCMP_Index_close_px_list.append(items)
start_value = start_value+1
if start_value == end_value:
break
else:
continue
try06 = np.corrcoef(['ESA_Index_close_px_list','CCMP_Index_close_px_list'])
print (try06)
Out:
****TypeError: cannot perform reduce with flexible type****
Also tried .astype but not made any difference.
data['ESA Index_close_px'].astype(float)
data['CCMP Index_close_px'].astype(float)
Using Python 3.5, pandas 0.18.1 and numpy 1.11.1
Would really appreciate any suggestion.
**edit1:*
Data is coming from an excel spreadsheet
data = pd.read_excel('C:\\Users\\Ako\\Desktop\\ako_files\\for_corr_‌​tool.xlsx') prior to the correlation attempts, there are only column renames and
data = data.drop(data.index[0])
to get rid of a line
regarding the types:
print (type (data['ESA Index_close_px']))
print (type (data['ESA Index_close_px'][1]))
Out:
**edit2*
parts of the data:
print (data['ESA Index_close_px'][1:10])
print (data['CCMP Index_close_px'][1:10])
Out:
2 2137
3 2138
4 2132
5 2123
6 2127
7 2126.25
8 2131.5
9 2134.5
10 2159
Name: ESA Index_close_px, dtype: object
2 5241.83
3 5246.41
4 5243.84
5 5199.82
6 5214.16
7 5213.33
8 5239.02
9 5246.79
10 5328.67
Name: CCMP Index_close_px, dtype: object
Well, I've encountered the same problem today.
try use .astype('float64') to help make the type correct.
data['ESA Index_close_px'][5:50].astype('float64').corr(data['CCMP Index_close_px'][5:50].astype('float64'))
This works well for me. Hope it can help you as well.
You can try as following:
Top15['Citable docs per capita']=(Top15['Citable docs per capita']*100000)
Top15['Citable docs per capita'].astype('int').corr(Top15['Energy Supply per Capita'].astype('int'))
It worked for me.

Resources