Why this code doesn't print the single index? - python-3.x

We have the eng_stress and eng_strain arrays taken from excel file
eng_stress = np.array(eng_stress)
eng_strain = np.array(strain_percent / 100)
eng_strain = eng_strain + 1
true_stress = np.multiply(eng_stress, eng_strain)
true_strain = np.log(eng_strain)
print(true_stress[10])
When I try to acces to a certain index, something like the following happens instead of single outcome.
[466.12834181 466.2044319 466.27916323 466.35480041 466.43043758
466.50562183 466.58125901 466.65689618 466.73208043 466.80771761
466.8838077 466.95853903 467.03508204 467.10981338 467.18545055
467.26108772 467.33627198 467.41145623 467.48709341 467.56273058
467.63882067 467.71355201 467.78918918 467.86482635 467.94001061
468.0161007 468.09083203 468.16692212 468.2425593 468.31774355
468.39292781 468.4690179 468.54374923 468.61983932 468.69502358
468.77020783 468.84629792 468.92148218 468.99666643 469.07275652
469.14794078 469.22357795 469.29966804 469.37439938 469.45048947
469.52522081 469.6013109 469.67649515 469.75167941 469.82731658
469.90295375 469.97813801 470.05377518 470.12895943 470.20459661
470.2806867 470.35541803 470.43196104 470.50669238 470.58232955
470.65796672 470.7336039 470.80833523 470.88442532 470.95960958
471.03524675 471.11088392 471.18606818 471.26170535 471.33688961
471.41252678 471.48771103 471.56380112 471.63898538 471.71507547
471.78980681 471.8658969 471.94108115 472.01626541 472.09190258
472.16753975 472.24272401 472.3188141 472.39354543 472.46918261
472.5452727 472.62000403 472.69654704 472.77127838 472.84736847
472.92209981 472.99773698 473.07337415 473.14901132 473.22419558
473.30028567 473.37501701 473.45065418 473.52629135 473.60102269
473.6775657 473.75229703 473.82884004 473.9040243 473.97920855
474.05439281 474.1304829 474.20521423 474.28130432 474.35648858
474.43167283 474.50776292 474.58294718 474.65813143 474.73376861
474.80940578 474.88459003 474.96113304 475.03586438 475.11195447
475.18668581 475.2627759 475.33796015 475.41314441 475.48878158
475.56441875 475.63960301 475.7156931 475.79087735 475.86606161
475.9421517 476.01688303 476.09342604 476.16815738 476.24379455
476.31943172 476.3950689 476.46980023 476.54589032 476.62107458
476.69716467 476.77234892 476.84753318 476.92317035 476.99835461
477.0744447 477.14962895 477.22526612 477.30045038 477.37654047
477.45127181 477.5273619 477.60209323 477.67773041 477.75336758
477.82855183 477.90464192 477.9802791 478.05501043 478.13064761
478.20628478 478.28146903 478.35801204 478.43274338 478.50883347
478.58356481 478.65920198 478.73483915 478.81002341 478.88566058
478.96175067 479.03648201 479.1125721 479.18775635 479.26294061
479.3390307 479.41376203 479.48985212 479.5650363......... 532]

Maybe eng_stress is a 2D array?
Try:
print(eng_stress.shape)
to find out the shape of the arrays you are working with :)
If your array has the shape (X,1) then it might be in the wrong direction and you could do a quick fix by changing your code to:
eng_stress = np.array(eng_stress).T[0]
eng_strain = np.array(strain_percent / 100)
eng_strain = eng_strain + 1
true_stress = np.multiply(eng_stress, eng_strain)
true_strain = np.log(eng_strain)
print(true_stress[10])

Your numpy arrays may be 2-dimensional. That's why it's printing an array rather than a value. To access a single value of column x, try print(true_stress[10][x]).
The other thing you can do is multiply two 1D numpy arrays. In that case, you'll get a single value.

Related

How do I convert numpy array to days, hours, mins?

Running with this series
X = number_of_logons_all.values
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
I get:
mean1=60785.792548, mean2=61291.266868
variance1=7483553053.651829, variance2=7603208729.348722
But I wanted something like this in my PyCharm console (pulled from another result):
>>> -103 days +04:37:13.802435724...
Tried to place the np.array in a pd.Dataframe() to get the expected value by adding
.apply(pd.to_timedelta, unit='s')
...this didn't work, so I tried
new = pd.DataFrame([mean1]).to_numpy(dtype='timedelta64[ns]')
...and (still) got something like this:
>>>> [[63394]]
Anyone out there who could assist me converting to an easily comprehended datetime result from my means calculation above?
Thx, in advance for your kind support.
You can use f-strings:
mean1, mean2 = 60785.792548, 61291.266868
variance1, variance2=7603208729.348722,7483553053.651829
print(f'mean1={pd.Timedelta(mean1, unit="s")}, mean2={pd.Timedelta(mean2, unit="s")}')
print(f'variance1={pd.Timedelta(variance1, unit="s")}, variance2={pd.Timedelta(variance2, unit="s")}')
mean1=0 days 16:53:05.792548, mean2=0 days 17:01:31.266868
variance1=88000 days 02:25:29.348722458, variance2=86615 days 04:44:13.651828766

Eliminate one list according to another list in Python

I have two dimensional list like that
x_irp_group = [['x1_1_4', 'x1_2_4', 'x1_3_4', 'x1_4_4', 'x1_5_4', 'x1_6_4', 'x1_7_4', 'x1_8_4', 'x1_9_4', 'x1_10_4', 'x1_1_5', 'x1_2_5', 'x1_3_5', 'x1_4_5', 'x1_5_5', 'x1_6_5', 'x1_7_5', 'x1_8_5', 'x1_9_5', 'x1_10_5', 'x1_1_6', 'x1_2_6', 'x1_3_6', 'x1_4_6', 'x1_5_6', 'x1_6_6', 'x1_7_6', 'x1_8_6', 'x1_9_6', 'x1_10_6', 'x1_1_7', 'x1_2_7', 'x1_3_7', 'x1_4_7', 'x1_5_7', 'x1_6_7', 'x1_7_7', 'x1_8_7', 'x1_9_7', 'x1_10_7', 'x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8'], ['x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8', 'x1_1_9', 'x1_2_9', 'x1_3_9', 'x1_4_9', 'x1_5_9', 'x1_6_9', 'x1_7_9', 'x1_8_9', 'x1_9_9', 'x1_10_9', 'x1_1_10', 'x1_2_10', 'x1_3_10', 'x1_4_10', 'x1_5_10', 'x1_6_10', 'x1_7_10', 'x1_8_10', 'x1_9_10', 'x1_10_10', 'x1_1_11', 'x1_2_11', 'x1_3_11', 'x1_4_11', 'x1_5_11', 'x1_6_11', 'x1_7_11', 'x1_8_11', 'x1_9_11', 'x1_10_11', 'x1_1_12', 'x1_2_12', 'x1_3_12', 'x1_4_12', 'x1_5_12', 'x1_6_12', 'x1_7_12', 'x1_8_12', 'x1_9_12', 'x1_10_12']]
I wanna eliminate this two dimensional list if the elements in another one dimensional list like that
x_irp_eliminated_list = ['x1_1_4', 'x1_1_8', 'x1_1_12', 'x1_1_16', 'x1_1_19', 'x1_1_22', 'x1_1_26', 'x1_1_30', 'x1_1_34', 'x1_1_37', 'x1_1_43', 'x1_1_49', 'x1_1_55', 'x1_1_61', 'x1_1_68', 'x1_1_75', 'x1_1_81', 'x1_1_87', 'x1_1_92', 'x1_1_96', 'x1_1_101', 'x1_1_107', 'x1_1_112', 'x1_1_116', 'x1_1_121', 'x1_1_126', 'x1_1_131', 'x1_1_134', 'x1_1_137', 'x1_1_141', 'x1_1_145', 'x1_1_149', 'x1_1_152', 'x1_1_155', 'x1_1_160', 'x1_1_164', 'x1_1_169', 'x1_1_173', 'x1_1_181', 'x1_1_189', 'x1_1_197', 'x1_1_205', 'x1_2_8', 'x1_2_10', 'x1_2_13', 'x1_2_17', 'x1_2_21', 'x1_2_25', 'x1_2_28', 'x1_2_30', 'x1_2_34', 'x1_2_40', 'x1_2_45', 'x1_2_51', 'x1_2_58', 'x1_2_66', 'x1_2_71', 'x1_2_77', 'x1_2_82', 'x1_2_86', 'x1_2_91', 'x1_2_97', 'x1_2_102', 'x1_2_106', 'x1_2_111', 'x1_2_117', 'x1_2_122', 'x1_2_125', 'x1_2_129', 'x1_2_132', 'x1_2_135', 'x1_2_139', 'x1_2_143', 'x1_2_147', 'x1_2_151', 'x1_2_154', 'x1_2_157', 'x1_2_161', 'x1_2_166', 'x1_2_172', 'x1_2_177', 'x1_2_181', 'x1_2_189', 'x1_2_197', 'x1_2_205', 'x1_2_214', 'x1_3_1', 'x1_3_4', 'x1_3_8', 'x1_3_11', 'x1_3_15', 'x1_3_18', 'x1_3_22', 'x1_3_25', 'x1_3_28', 'x1_3_32', 'x1_3_35', 'x1_3_39', 'x1_3_42', 'x1_3_46', 'x1_3_49', 'x1_3_52', 'x1_3_56', 'x1_3_59', 'x1_3_63', 'x1_3_66', 'x1_3_70', 'x1_3_73', 'x1_3_77', 'x1_3_81', 'x1_3_85', 'x1_3_88', 'x1_3_91', 'x1_3_94', 'x1_3_97', 'x1_3_101', 'x1_3_105', 'x1_3_109', 'x1_3_112', 'x1_3_115', 'x1_3_118', 'x1_3_122', 'x1_3_126', 'x1_3_130', 'x1_3_134', 'x1_3_137', 'x1_3_140', 'x1_3_143', 'x1_3_147', 'x1_3_151', 'x1_3_156', 'x1_3_159', 'x1_3_163']
I write a code like that but it did not work well.
x_final = [i for i, j in zip(x_irp_group, x_irp_eliminated_list) if i == j]
I shorten the lists. Normally their sizes are much bigger than that
the list comprehension you have isn't working because you are zipping the elements together, which isn't what the operation represents (they are not parallel arrays) what you want is something along the lines of:
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_list)]
Note that for a 2d list you may need to nest this like:
# writing normal loops you'd write:
# for row in x_irp_group:
# for i in row:
# if (...):
# so I typically try to indent the loops similarly since nested array comprehension
# gets complicated, honestly I'd likely prefer using generator functions for this anyway
x_final = [[i for i in row
if (i not in x_irp_eliminated_list)
]for row in x_irp_group
]
although know that i not in x_irp_eliminated_list will be very slow for a list, changing it to a set would improve performance:
x_irp_eliminated_set = set(x_irp_eliminated_list)
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_set)]
Or if the lists are trivially sorted, then you could convert them both to sets, do a subtraction then sort it again:
x_final = [ sorted(set(x_irp_group[0]) - set(x_irp_eliminated_list)) ]
although if you have super giant lists this would probably be less desirable.
x_irp_eliminated_list_set = set(x_irp_eliminated_list)
x_last = [i for row in x_irp_group
for i in row
if (i in x_irp_eliminated_list_set)]
print(x_last[:30])
I used this for faster operation. Set approach made it faster. Thanks for that information. I learn one new thing. But it creates one dimensional list. I would like to create two dimensional list like original x_irp_group

Append numpy array with inequal row in a loop

I want to append several arrays but with different size. However I don't want to merge them together, just stock them in a mega-list. Here a simplified code of mine which try to reproduce my problem:
import numpy as np
total_wavel = 5
tot_values = []
for i in range(total_wavel):
size = int(np.random.uniform(low=2, high=7))
values = np.array(np.random.uniform(low=1, high=6, size=(size,)))
tot_values = np.append(tot_values,values)
Exemple Output :
array([4.88776545, 4.86006097, 1.80835575, 3.52393214, 2.88971373,
1.62978552, 4.06880898, 4.10556672, 1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801, 1.7403673 ,
4.90459377, 3.44198867, 5.03055533, 3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Expected Output :
np.array([np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214)], np.array([2.88971373,
1.62978552, 4.06880898, 4.10556672]), np.array([1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801]), np.array([1.7403673 ,
4.90459377, 3.44198867, 5.03055533], np.array([3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])])
Or
np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214], [2.88971373,
1.62978552, 4.06880898, 4.10556672],[1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801], [1.7403673 ,
4.90459377, 3.44198867, 5.03055533], [3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Thank you in advance
In for loop tot_values.append(list(values)), and after loop tot_np=np.array(tot_values)

Calculate the average of Spearman correlation

I have 2 columns A and B which contain the Spearman's correlation values as follows:
0.127272727 -0.260606061
-0.090909091 -0.224242424
0.345454545 0.745454545
0.478787879 0.660606061
-0.345454545 -0.333333333
0.151515152 -0.127272727
0.478787879 0.660606061
-0.321212121 -0.284848485
0.284848485 0.515151515
0.36969697 -0.139393939
-0.284848485 0.272727273
How can I calculate the average of those correlation values in these 2 columns in Excel or Matlab ? I found a close answer in this link : https://stats.stackexchange.com/questions/8019/averaging-correlation-values
The main point is we can not use mean or average in this case, as explained in the link. They proposed a nice way to do that, but I dont know how to implement it in Excel or Matlab.
Following the second answer of the link you provided, which is the most general case, you can calculate the average Spearman's rho in Matlab as follows:
M = [0.127272727 -0.260606061;
-0.090909091 -0.224242424;
0.345454545 0.745454545;
0.478787879 0.660606061;
-0.345454545 -0.333333333;
0.151515152 -0.127272727;
0.478787879 0.660606061;
-0.321212121 -0.284848485;
0.284848485 0.515151515;
0.36969697 -0.139393939;
-0.284848485 0.272727273];
z = atanh(M);
meanRho = tanh(mean(z));
As you can see it gives mean values of
meanRho =
0.1165 0.1796
whereas the simple mean is quite close:
mean(M)
ans =
0.1085 0.1350
Edit: more information on Fisher's transformation here.
In MATLAB, define a matrix with these values and use mean function as follows:
%define a matrix M
M = [0.127272727 -0.260606061;
-0.090909091 -0.224242424;
0.345454545 0.745454545;
0.478787879 0.660606061;
-0.345454545 -0.333333333;
0.151515152 -0.127272727;
0.478787879 0.660606061;
-0.321212121 -0.284848485;
0.284848485 0.515151515;
0.36969697 -0.139393939;
-0.284848485 0.272727273];
%calculates the mean of each column
meanVals = mean(M);
Result
meanVals =
0.1085 0.1350
It is also possible to calculate the total meanm and the mean of each row as follows:
meanVals = mean(M); %total mean
meanVals = mean(M,2); %mean of each row

data.frame slicing

I hope this question is not too simple for this board.
I have created a data.frame df:
CAS Name CID
89 13010-47-4 Lomustine 3950
90 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
91 130636-43-0 Nifekalant 268083
92 130929-57-6 Entacapone 5281081
and a vector vec
[1] 5282380 18471829 45923789 44308022 44266812 24883465 24867475 24867460
I would like to extract the rows of df which contains any number of vec. I tried to solve this problem by this code:
df$GC[(df$CID %in% vec)] = 1
df[df$GC==1,]
But the problem with this solution is, that I only get the rows, which contain only one number in the CID column. Rows which contain several values in CID like line 90 do not appear.
Is there an elegant solution for this problem?
Thanks in advance
Given your comment on EDi's answer (which I like) I thought I'd make a suggestion.
Squeezing comma separated values into a single column of a data frame is awkward and (in my experience) just leads to frustration. I often find it simpler to keep it in a separate data structure, a list:
dat <- read.table(text = " CAS Name CID
13010-47-4 Lomustine 3950
130209-82-4 Latanoprost 5311221,5282380,46705340,3890
130636-43-0 Nifekalant 268083
130929-57-6 Entacapone 5281081",sep = "",header = TRUE)
cid <- sapply(dat$CID,strsplit,",",USE.NAMES = FALSE)
In this form, things are often easier to work with:
ID <- c(5282380, 18471829, 45923789, 44308022, 44266812, 24883465, 24867475, 24867460, 3950)
dat[sapply(cid,function(x) {any(x %in% as.character(ID))}),]
CAS Name CID
1 13010-47-4 Lomustine 3950
2 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
You can always use rownames in dat and the names of the list to keep each item straight, if you're worried about orderings changing.
(Also note that my anonymous function is assuming that ID will be found eventually by R's scoping rules; you can alter the function to pass in ID explicitly if you like.)
One way is to use grep():
> txt <- " CAS Name CID
+ 13010-47-4 Lomustine 3950
+ 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
+ 130636-43-0 Nifekalant 268083
+ 130929-57-6 Entacapone 5281081
+ "
> con <- textConnection(txt)
> df <- read.table(con, header = TRUE)
> close(con)
> ID <- c(5282380, 18471829, 45923789, 44308022, 44266812, 24883465, 24867475, 24867460, 3950)
> grep(paste("\\b", ID, "\\b", sep="", collapse = "|"), dat$CID)
[1] 1 2

Resources