Running with this series
X = number_of_logons_all.values
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
I get:
mean1=60785.792548, mean2=61291.266868
variance1=7483553053.651829, variance2=7603208729.348722
But I wanted something like this in my PyCharm console (pulled from another result):
>>> -103 days +04:37:13.802435724...
Tried to place the np.array in a pd.Dataframe() to get the expected value by adding
.apply(pd.to_timedelta, unit='s')
...this didn't work, so I tried
new = pd.DataFrame([mean1]).to_numpy(dtype='timedelta64[ns]')
...and (still) got something like this:
>>>> [[63394]]
Anyone out there who could assist me converting to an easily comprehended datetime result from my means calculation above?
Thx, in advance for your kind support.
You can use f-strings:
mean1, mean2 = 60785.792548, 61291.266868
variance1, variance2=7603208729.348722,7483553053.651829
print(f'mean1={pd.Timedelta(mean1, unit="s")}, mean2={pd.Timedelta(mean2, unit="s")}')
print(f'variance1={pd.Timedelta(variance1, unit="s")}, variance2={pd.Timedelta(variance2, unit="s")}')
mean1=0 days 16:53:05.792548, mean2=0 days 17:01:31.266868
variance1=88000 days 02:25:29.348722458, variance2=86615 days 04:44:13.651828766
I have two dimensional list like that
x_irp_group = [['x1_1_4', 'x1_2_4', 'x1_3_4', 'x1_4_4', 'x1_5_4', 'x1_6_4', 'x1_7_4', 'x1_8_4', 'x1_9_4', 'x1_10_4', 'x1_1_5', 'x1_2_5', 'x1_3_5', 'x1_4_5', 'x1_5_5', 'x1_6_5', 'x1_7_5', 'x1_8_5', 'x1_9_5', 'x1_10_5', 'x1_1_6', 'x1_2_6', 'x1_3_6', 'x1_4_6', 'x1_5_6', 'x1_6_6', 'x1_7_6', 'x1_8_6', 'x1_9_6', 'x1_10_6', 'x1_1_7', 'x1_2_7', 'x1_3_7', 'x1_4_7', 'x1_5_7', 'x1_6_7', 'x1_7_7', 'x1_8_7', 'x1_9_7', 'x1_10_7', 'x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8'], ['x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8', 'x1_1_9', 'x1_2_9', 'x1_3_9', 'x1_4_9', 'x1_5_9', 'x1_6_9', 'x1_7_9', 'x1_8_9', 'x1_9_9', 'x1_10_9', 'x1_1_10', 'x1_2_10', 'x1_3_10', 'x1_4_10', 'x1_5_10', 'x1_6_10', 'x1_7_10', 'x1_8_10', 'x1_9_10', 'x1_10_10', 'x1_1_11', 'x1_2_11', 'x1_3_11', 'x1_4_11', 'x1_5_11', 'x1_6_11', 'x1_7_11', 'x1_8_11', 'x1_9_11', 'x1_10_11', 'x1_1_12', 'x1_2_12', 'x1_3_12', 'x1_4_12', 'x1_5_12', 'x1_6_12', 'x1_7_12', 'x1_8_12', 'x1_9_12', 'x1_10_12']]
I wanna eliminate this two dimensional list if the elements in another one dimensional list like that
x_irp_eliminated_list = ['x1_1_4', 'x1_1_8', 'x1_1_12', 'x1_1_16', 'x1_1_19', 'x1_1_22', 'x1_1_26', 'x1_1_30', 'x1_1_34', 'x1_1_37', 'x1_1_43', 'x1_1_49', 'x1_1_55', 'x1_1_61', 'x1_1_68', 'x1_1_75', 'x1_1_81', 'x1_1_87', 'x1_1_92', 'x1_1_96', 'x1_1_101', 'x1_1_107', 'x1_1_112', 'x1_1_116', 'x1_1_121', 'x1_1_126', 'x1_1_131', 'x1_1_134', 'x1_1_137', 'x1_1_141', 'x1_1_145', 'x1_1_149', 'x1_1_152', 'x1_1_155', 'x1_1_160', 'x1_1_164', 'x1_1_169', 'x1_1_173', 'x1_1_181', 'x1_1_189', 'x1_1_197', 'x1_1_205', 'x1_2_8', 'x1_2_10', 'x1_2_13', 'x1_2_17', 'x1_2_21', 'x1_2_25', 'x1_2_28', 'x1_2_30', 'x1_2_34', 'x1_2_40', 'x1_2_45', 'x1_2_51', 'x1_2_58', 'x1_2_66', 'x1_2_71', 'x1_2_77', 'x1_2_82', 'x1_2_86', 'x1_2_91', 'x1_2_97', 'x1_2_102', 'x1_2_106', 'x1_2_111', 'x1_2_117', 'x1_2_122', 'x1_2_125', 'x1_2_129', 'x1_2_132', 'x1_2_135', 'x1_2_139', 'x1_2_143', 'x1_2_147', 'x1_2_151', 'x1_2_154', 'x1_2_157', 'x1_2_161', 'x1_2_166', 'x1_2_172', 'x1_2_177', 'x1_2_181', 'x1_2_189', 'x1_2_197', 'x1_2_205', 'x1_2_214', 'x1_3_1', 'x1_3_4', 'x1_3_8', 'x1_3_11', 'x1_3_15', 'x1_3_18', 'x1_3_22', 'x1_3_25', 'x1_3_28', 'x1_3_32', 'x1_3_35', 'x1_3_39', 'x1_3_42', 'x1_3_46', 'x1_3_49', 'x1_3_52', 'x1_3_56', 'x1_3_59', 'x1_3_63', 'x1_3_66', 'x1_3_70', 'x1_3_73', 'x1_3_77', 'x1_3_81', 'x1_3_85', 'x1_3_88', 'x1_3_91', 'x1_3_94', 'x1_3_97', 'x1_3_101', 'x1_3_105', 'x1_3_109', 'x1_3_112', 'x1_3_115', 'x1_3_118', 'x1_3_122', 'x1_3_126', 'x1_3_130', 'x1_3_134', 'x1_3_137', 'x1_3_140', 'x1_3_143', 'x1_3_147', 'x1_3_151', 'x1_3_156', 'x1_3_159', 'x1_3_163']
I write a code like that but it did not work well.
x_final = [i for i, j in zip(x_irp_group, x_irp_eliminated_list) if i == j]
I shorten the lists. Normally their sizes are much bigger than that
the list comprehension you have isn't working because you are zipping the elements together, which isn't what the operation represents (they are not parallel arrays) what you want is something along the lines of:
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_list)]
Note that for a 2d list you may need to nest this like:
# writing normal loops you'd write:
# for row in x_irp_group:
# for i in row:
# if (...):
# so I typically try to indent the loops similarly since nested array comprehension
# gets complicated, honestly I'd likely prefer using generator functions for this anyway
x_final = [[i for i in row
if (i not in x_irp_eliminated_list)
]for row in x_irp_group
]
although know that i not in x_irp_eliminated_list will be very slow for a list, changing it to a set would improve performance:
x_irp_eliminated_set = set(x_irp_eliminated_list)
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_set)]
Or if the lists are trivially sorted, then you could convert them both to sets, do a subtraction then sort it again:
x_final = [ sorted(set(x_irp_group[0]) - set(x_irp_eliminated_list)) ]
although if you have super giant lists this would probably be less desirable.
x_irp_eliminated_list_set = set(x_irp_eliminated_list)
x_last = [i for row in x_irp_group
for i in row
if (i in x_irp_eliminated_list_set)]
print(x_last[:30])
I used this for faster operation. Set approach made it faster. Thanks for that information. I learn one new thing. But it creates one dimensional list. I would like to create two dimensional list like original x_irp_group
I want to append several arrays but with different size. However I don't want to merge them together, just stock them in a mega-list. Here a simplified code of mine which try to reproduce my problem:
import numpy as np
total_wavel = 5
tot_values = []
for i in range(total_wavel):
size = int(np.random.uniform(low=2, high=7))
values = np.array(np.random.uniform(low=1, high=6, size=(size,)))
tot_values = np.append(tot_values,values)
Exemple Output :
array([4.88776545, 4.86006097, 1.80835575, 3.52393214, 2.88971373,
1.62978552, 4.06880898, 4.10556672, 1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801, 1.7403673 ,
4.90459377, 3.44198867, 5.03055533, 3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Expected Output :
np.array([np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214)], np.array([2.88971373,
1.62978552, 4.06880898, 4.10556672]), np.array([1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801]), np.array([1.7403673 ,
4.90459377, 3.44198867, 5.03055533], np.array([3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])])
Or
np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214], [2.88971373,
1.62978552, 4.06880898, 4.10556672],[1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801], [1.7403673 ,
4.90459377, 3.44198867, 5.03055533], [3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Thank you in advance
In for loop tot_values.append(list(values)), and after loop tot_np=np.array(tot_values)
I hope this question is not too simple for this board.
I have created a data.frame df:
CAS Name CID
89 13010-47-4 Lomustine 3950
90 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
91 130636-43-0 Nifekalant 268083
92 130929-57-6 Entacapone 5281081
and a vector vec
[1] 5282380 18471829 45923789 44308022 44266812 24883465 24867475 24867460
I would like to extract the rows of df which contains any number of vec. I tried to solve this problem by this code:
df$GC[(df$CID %in% vec)] = 1
df[df$GC==1,]
But the problem with this solution is, that I only get the rows, which contain only one number in the CID column. Rows which contain several values in CID like line 90 do not appear.
Is there an elegant solution for this problem?
Thanks in advance
Given your comment on EDi's answer (which I like) I thought I'd make a suggestion.
Squeezing comma separated values into a single column of a data frame is awkward and (in my experience) just leads to frustration. I often find it simpler to keep it in a separate data structure, a list:
dat <- read.table(text = " CAS Name CID
13010-47-4 Lomustine 3950
130209-82-4 Latanoprost 5311221,5282380,46705340,3890
130636-43-0 Nifekalant 268083
130929-57-6 Entacapone 5281081",sep = "",header = TRUE)
cid <- sapply(dat$CID,strsplit,",",USE.NAMES = FALSE)
In this form, things are often easier to work with:
ID <- c(5282380, 18471829, 45923789, 44308022, 44266812, 24883465, 24867475, 24867460, 3950)
dat[sapply(cid,function(x) {any(x %in% as.character(ID))}),]
CAS Name CID
1 13010-47-4 Lomustine 3950
2 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
You can always use rownames in dat and the names of the list to keep each item straight, if you're worried about orderings changing.
(Also note that my anonymous function is assuming that ID will be found eventually by R's scoping rules; you can alter the function to pass in ID explicitly if you like.)
One way is to use grep():
> txt <- " CAS Name CID
+ 13010-47-4 Lomustine 3950
+ 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
+ 130636-43-0 Nifekalant 268083
+ 130929-57-6 Entacapone 5281081
+ "
> con <- textConnection(txt)
> df <- read.table(con, header = TRUE)
> close(con)
> ID <- c(5282380, 18471829, 45923789, 44308022, 44266812, 24883465, 24867475, 24867460, 3950)
> grep(paste("\\b", ID, "\\b", sep="", collapse = "|"), dat$CID)
[1] 1 2