Numerical integration of a numpy array in incremental time steps - python-3.x

I have two arrays. The first one is time in terms of Age (yrs) and the second one is a parameter that needs to be integrated with respect to time.
age = [5.00000e+08, 5.60322e+08, 6.27922e+08, 7.03678e+08, 7.88572e+08,
8.83709e+08, 9.90324e+08, 1.10980e+09, 1.24369e+09, 1.39374e+09,
1.56188e+09, 1.75032e+09, 1.96148e+09, 2.19813e+09, 2.46332e+09,
2.76050e+09, 3.09354e+09, 3.46676e+09, 3.88501e+09, 4.35371e+09,
4.87897e+09, 5.46759e+09, 6.12722e+09, 6.86644e+09, 7.69484e+09,
8.62318e+09, 9.66352e+09, 1.08294e+10, 1.21359e+10, 1.36000e+10]
sfr = [1.86120543e-02, 1.46680445e-02, 1.07275184e-02, 8.56960274e-03,
6.44041855e-03, 4.93194263e-03, 3.69203448e-05, 2.69813985e-04,
6.17644783e-04, 1.00780427e-02, 1.20645391e-02, 3.05009362e-02,
3.91535011e-02, 5.35479858e-02, 7.36489068e-02, 9.63931263e-02,
1.11108326e-01, 1.47781221e-01, 1.63057763e-01, 2.27429626e-01,
2.20941333e-01, 2.74413180e-01, 2.72010867e-01, 4.32215233e-01,
5.79654549e-01, 7.39362218e-01, 9.41168727e-01, 1.18868347e+00,
1.42839043e+00, 1.91326333e+00]
I want to perform integration of sfr array with respect to age array, but in steps.
For example, the first integration should contain only the first elements of both arrays, the second integration should contain the first 2 elements of both arrays, the third should have first 3 elements of both arrays and so on and so forth. And save the integration result for each step in a single output array.

The exact form of your desired result is not so clear. So, here are 2 posibilities:
age = [5.00000e+08, 5.60322e+08, 6.27922e+08, 7.03678e+08, 7.88572e+08,
8.83709e+08, 9.90324e+08, 1.10980e+09, 1.24369e+09, 1.39374e+09,
1.56188e+09, 1.75032e+09, 1.96148e+09, 2.19813e+09, 2.46332e+09,
2.76050e+09, 3.09354e+09, 3.46676e+09, 3.88501e+09, 4.35371e+09,
4.87897e+09, 5.46759e+09, 6.12722e+09, 6.86644e+09, 7.69484e+09,
8.62318e+09, 9.66352e+09, 1.08294e+10, 1.21359e+10, 1.36000e+10]
sfr = [1.86120543e-02, 1.46680445e-02, 1.07275184e-02, 8.56960274e-03,
6.44041855e-03, 4.93194263e-03, 3.69203448e-05, 2.69813985e-04,
6.17644783e-04, 1.00780427e-02, 1.20645391e-02, 3.05009362e-02,
3.91535011e-02, 5.35479858e-02, 7.36489068e-02, 9.63931263e-02,
1.11108326e-01, 1.47781221e-01, 1.63057763e-01, 2.27429626e-01,
2.20941333e-01, 2.74413180e-01, 2.72010867e-01, 4.32215233e-01,
5.79654549e-01, 7.39362218e-01, 9.41168727e-01, 1.18868347e+00,
1.42839043e+00, 1.91326333e+00]
integr_pairs = [[(a, s) for a, s in zip(age[:i], sfr[:i])] for i in range(1, len(age))]
print(integr_pairs)
# [[(500000000.0, 0.0186120543)], [(500000000.0, 0.0186120543), (560322000.0, 0.0146680445)], ....
integr_list = [[item for t in [(a, s) for a, s in zip(age[:i], sfr[:i])] for item in t ]for i in range(1, len(age))]
print(integr_list)
# [[500000000.0, 0.0186120543], [500000000.0, 0.0186120543, 560322000.0, 0.0146680445],

Related

Eliminate one list according to another list in Python

I have two dimensional list like that
x_irp_group = [['x1_1_4', 'x1_2_4', 'x1_3_4', 'x1_4_4', 'x1_5_4', 'x1_6_4', 'x1_7_4', 'x1_8_4', 'x1_9_4', 'x1_10_4', 'x1_1_5', 'x1_2_5', 'x1_3_5', 'x1_4_5', 'x1_5_5', 'x1_6_5', 'x1_7_5', 'x1_8_5', 'x1_9_5', 'x1_10_5', 'x1_1_6', 'x1_2_6', 'x1_3_6', 'x1_4_6', 'x1_5_6', 'x1_6_6', 'x1_7_6', 'x1_8_6', 'x1_9_6', 'x1_10_6', 'x1_1_7', 'x1_2_7', 'x1_3_7', 'x1_4_7', 'x1_5_7', 'x1_6_7', 'x1_7_7', 'x1_8_7', 'x1_9_7', 'x1_10_7', 'x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8'], ['x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8', 'x1_1_9', 'x1_2_9', 'x1_3_9', 'x1_4_9', 'x1_5_9', 'x1_6_9', 'x1_7_9', 'x1_8_9', 'x1_9_9', 'x1_10_9', 'x1_1_10', 'x1_2_10', 'x1_3_10', 'x1_4_10', 'x1_5_10', 'x1_6_10', 'x1_7_10', 'x1_8_10', 'x1_9_10', 'x1_10_10', 'x1_1_11', 'x1_2_11', 'x1_3_11', 'x1_4_11', 'x1_5_11', 'x1_6_11', 'x1_7_11', 'x1_8_11', 'x1_9_11', 'x1_10_11', 'x1_1_12', 'x1_2_12', 'x1_3_12', 'x1_4_12', 'x1_5_12', 'x1_6_12', 'x1_7_12', 'x1_8_12', 'x1_9_12', 'x1_10_12']]
I wanna eliminate this two dimensional list if the elements in another one dimensional list like that
x_irp_eliminated_list = ['x1_1_4', 'x1_1_8', 'x1_1_12', 'x1_1_16', 'x1_1_19', 'x1_1_22', 'x1_1_26', 'x1_1_30', 'x1_1_34', 'x1_1_37', 'x1_1_43', 'x1_1_49', 'x1_1_55', 'x1_1_61', 'x1_1_68', 'x1_1_75', 'x1_1_81', 'x1_1_87', 'x1_1_92', 'x1_1_96', 'x1_1_101', 'x1_1_107', 'x1_1_112', 'x1_1_116', 'x1_1_121', 'x1_1_126', 'x1_1_131', 'x1_1_134', 'x1_1_137', 'x1_1_141', 'x1_1_145', 'x1_1_149', 'x1_1_152', 'x1_1_155', 'x1_1_160', 'x1_1_164', 'x1_1_169', 'x1_1_173', 'x1_1_181', 'x1_1_189', 'x1_1_197', 'x1_1_205', 'x1_2_8', 'x1_2_10', 'x1_2_13', 'x1_2_17', 'x1_2_21', 'x1_2_25', 'x1_2_28', 'x1_2_30', 'x1_2_34', 'x1_2_40', 'x1_2_45', 'x1_2_51', 'x1_2_58', 'x1_2_66', 'x1_2_71', 'x1_2_77', 'x1_2_82', 'x1_2_86', 'x1_2_91', 'x1_2_97', 'x1_2_102', 'x1_2_106', 'x1_2_111', 'x1_2_117', 'x1_2_122', 'x1_2_125', 'x1_2_129', 'x1_2_132', 'x1_2_135', 'x1_2_139', 'x1_2_143', 'x1_2_147', 'x1_2_151', 'x1_2_154', 'x1_2_157', 'x1_2_161', 'x1_2_166', 'x1_2_172', 'x1_2_177', 'x1_2_181', 'x1_2_189', 'x1_2_197', 'x1_2_205', 'x1_2_214', 'x1_3_1', 'x1_3_4', 'x1_3_8', 'x1_3_11', 'x1_3_15', 'x1_3_18', 'x1_3_22', 'x1_3_25', 'x1_3_28', 'x1_3_32', 'x1_3_35', 'x1_3_39', 'x1_3_42', 'x1_3_46', 'x1_3_49', 'x1_3_52', 'x1_3_56', 'x1_3_59', 'x1_3_63', 'x1_3_66', 'x1_3_70', 'x1_3_73', 'x1_3_77', 'x1_3_81', 'x1_3_85', 'x1_3_88', 'x1_3_91', 'x1_3_94', 'x1_3_97', 'x1_3_101', 'x1_3_105', 'x1_3_109', 'x1_3_112', 'x1_3_115', 'x1_3_118', 'x1_3_122', 'x1_3_126', 'x1_3_130', 'x1_3_134', 'x1_3_137', 'x1_3_140', 'x1_3_143', 'x1_3_147', 'x1_3_151', 'x1_3_156', 'x1_3_159', 'x1_3_163']
I write a code like that but it did not work well.
x_final = [i for i, j in zip(x_irp_group, x_irp_eliminated_list) if i == j]
I shorten the lists. Normally their sizes are much bigger than that
the list comprehension you have isn't working because you are zipping the elements together, which isn't what the operation represents (they are not parallel arrays) what you want is something along the lines of:
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_list)]
Note that for a 2d list you may need to nest this like:
# writing normal loops you'd write:
# for row in x_irp_group:
# for i in row:
# if (...):
# so I typically try to indent the loops similarly since nested array comprehension
# gets complicated, honestly I'd likely prefer using generator functions for this anyway
x_final = [[i for i in row
if (i not in x_irp_eliminated_list)
]for row in x_irp_group
]
although know that i not in x_irp_eliminated_list will be very slow for a list, changing it to a set would improve performance:
x_irp_eliminated_set = set(x_irp_eliminated_list)
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_set)]
Or if the lists are trivially sorted, then you could convert them both to sets, do a subtraction then sort it again:
x_final = [ sorted(set(x_irp_group[0]) - set(x_irp_eliminated_list)) ]
although if you have super giant lists this would probably be less desirable.
x_irp_eliminated_list_set = set(x_irp_eliminated_list)
x_last = [i for row in x_irp_group
for i in row
if (i in x_irp_eliminated_list_set)]
print(x_last[:30])
I used this for faster operation. Set approach made it faster. Thanks for that information. I learn one new thing. But it creates one dimensional list. I would like to create two dimensional list like original x_irp_group

Normalising units/Replace substrings based on lists using Python

I am trying to normalize weight units in a string.
Eg:
1.SUCO MARACUJA COM GENGIBRE PCS 300 Millilitre - SUCO MARACUJA COM GENGIBRE PCS 300 ML
2. OVOS CAIPIRAS ANA MARIA BRAGA 10UN - OVOS CAIPIRAS ANA MARIA BRAGA 10U
3. SUCO MARACUJA MAMAO PCS 300 Gram - SUCO MARACUJA MAMAO PCS 300 G
4. SUCO ABACAXI COM MACA PCS 300Milli litre - SUCO ABACAXI COM MACA PCS 300ML
The keyword table is :
unit = ['Kilo','Kilogram','Gram','Milligram','Millilitre','Milli
litre','Dozen','Litre','Un','Und','Unid','Unidad','Unidade','Unidades']
norm_unit = ['KG','KG','G','MG','ML','ML','DZ','L','U','U','U','U','U','U']
I tried to take up these lists as a table but am having difficulty in comparing two dataframes or tables in python.
I tried the below code.
unit = ['Kilo','Kilogram','Gram','Milligram','Millilitre','Milli
litre','Dozen','Litre','Un','Und','Unid','Unidad','Unidade','Unidades']
norm_unit = ['KG','KG','G','MG','ML','ML','DZ','L','U','U','U','U','U','U']
z='SUCO MARACUJA COM GENGIBRE PCS 300 Millilitre'
#for row in mongo_docs:
#z = row['clean_hntproductname']
for x in unit:
for y in norm_unit:
if (re.search(r'\s'+x+r'$',z,re.I)):
# clean_hntproductname = t.lower().replace(x.lower(),y.lower())
# myquery3 = { "_id" : row['_id']}
# newvalues3 = { "$set": {"clean_hntproductname" : 'clean_hntproductname'} }
# ds_hnt_prod_data.update_one(myquery3, newvalues3)
I'm using Python(Jupyter) with MongoDb(Compass). Fetching data from Mongo and writing back to it.
From my understanding you want to:
Update all the rows in a table which contain the words in the unit array, to the ones in norm_unit.
(Disclaimer: I'm not familiar with MongoDB or Python.)
What you want is to create a mapping (using a hash) of the words you want to change.
Here's a trivial solution (i.e. not best solution but would probably point you in the right direction.)
unit_conversions = {
'Kilo': 'KG'
'Kilogram': 'KG',
'Gram': 'G'
}
# pseudo-code
for each row that you want to update
item_description = get the value of the string in the column
for each key in unit_conversion (e.g. 'Kilo')
see if the item_description contains the key
if it does, replace it with unit_convertion[key] (e.g. 'KG')
update the row

how to choose certain elements of a matrix to create a new one with np.array?

I have a matrix called "times" of form (1,517) where are the times of a whole day 24 hours (in seconds Epoch time) and I want to create a new matrix with the times of each half hour, that is, starting from the first time then the one that corresponds to half hour later and so on until completing all the half hours that there are in a day, that is, 48
I created a delta of time with
dt = timedelta (hours = 0.5)
dts = timedelta.total_seconds (dt)
but I do not know how to do to indicate that my new matrix takes those elements
print(times.shape)
Out[4]: (1, 517)
print(times)
array([[1.55079361e+09, 1.55079377e+09, 1.55079394e+09, 1.55079410e+09,
1.55079430e+09, 1.55079446e+09, 1.55079462e+09, 1.55079479e+09,
1.55079495e+09, 1.55079512e+09, 1.55079528e+09, 1.55079544e+09,
1.55079561e+09, 1.55079577e+09, 1.55079594e+09, 1.55079614e+09,
1.55079630e+09, 1.55079646e+09, 1.55079663e+09, 1.55079679e+09,
1.55079695e+09, 1.55079712e+09, 1.55079728e+09, 1.55079744e+09,
1.55079761e+09, 1.55079781e+09, 1.55079797e+09, 1.55079814e+09,
1.55079830e+09, 1.55079846e+09, 1.55079863e+09, 1.55079879e+09,
1.55079895e+09, 1.55079912e+09, 1.55079928e+09, 1.55079945e+09,
1.55079964e+09, 1.55079981e+09, 1.55079997e+09, 1.55080014e+09,
1.55080030e+09, 1.55080046e+09, 1.55080063e+09, 1.55080079e+09,
1.55080096e+09, 1.55080112e+09, 1.55080128e+09, 1.55080148e+09,
1.55080164e+09, 1.55080181e+09, 1.55080197e+09, 1.55080214e+09,
1.55080230e+09, 1.55080246e+09, 1.55080263e+09, 1.55080279e+09,
1.55080296e+09, 1.55080312e+09, 1.55080332e+09, 1.55080348e+09,
1.55080364e+09, 1.55080381e+09, 1.55080397e+09, 1.55080414e+09,
1.55080430e+09, 1.55080446e+09, 1.55080463e+09, 1.55080479e+09,
1.55080496e+09, 1.55080516e+09, 1.55080532e+09, 1.55080548e+09,
1.55080565e+09, 1.55080581e+09, 1.55080597e+09, 1.55080614e+09,
1.55080630e+09, 1.55080646e+09, 1.55080663e+09, 1.55080683e+09,
1.55080699e+09, 1.55080716e+09, 1.55080732e+09, 1.55080748e+09,
1.55080765e+09, 1.55080781e+09, 1.55080797e+09, 1.55080814e+09,
1.55080830e+09, 1.55080847e+09, 1.55080866e+09, 1.55080883e+09,
1.55080899e+09, 1.55080916e+09, 1.55080932e+09, 1.55080948e+09,
1.55080965e+09, 1.55080981e+09, 1.55080998e+09, 1.55081014e+09,
1.55081030e+09, 1.55081050e+09, 1.55081066e+09, 1.55081083e+09,
1.55081099e+09, 1.55081116e+09, 1.55081132e+09, 1.55081148e+09,
1.55081165e+09, 1.55081181e+09, 1.55081198e+09, 1.55081214e+09,
1.55081234e+09, 1.55081250e+09, 1.55081266e+09, 1.55081283e+09,
1.55081299e+09, 1.55081316e+09, 1.55081332e+09, 1.55081348e+09,
1.55081365e+09, 1.55081381e+09, 1.55081398e+09, 1.55081418e+09,
1.55081434e+09, 1.55081450e+09, 1.55081467e+09, 1.55081483e+09,
1.55081499e+09, 1.55081516e+09, 1.55081532e+09, 1.55081548e+09,
1.55081565e+09, 1.55081585e+09, 1.55081601e+09, 1.55081618e+09,
1.55081634e+09, 1.55081650e+09, 1.55081667e+09, 1.55081683e+09,
1.55081699e+09, 1.55081716e+09, 1.55081732e+09, 1.55081749e+09,
1.55081768e+09, 1.55081785e+09, 1.55081801e+09, 1.55081818e+09,
1.55081834e+09, 1.55081850e+09, 1.55081867e+09, 1.55081883e+09,
1.55081900e+09, 1.55081916e+09, 1.55081932e+09, 1.55081952e+09,
1.55081968e+09, 1.55081985e+09, 1.55082001e+09, 1.55082018e+09,
1.55082034e+09, 1.55082050e+09, 1.55082067e+09, 1.55082083e+09,
1.55082100e+09, 1.55082116e+09, 1.55082136e+09, 1.55082152e+09,
1.55082168e+09, 1.55082185e+09, 1.55082201e+09, 1.55082218e+09,
1.55082234e+09, 1.55082250e+09, 1.55082267e+09, 1.55082283e+09,
1.55082300e+09, 1.55082320e+09, 1.55082336e+09, 1.55082352e+09,
1.55082369e+09, 1.55082385e+09, 1.55082401e+09, 1.55082418e+09,
1.55082434e+09, 1.55082450e+09, 1.55082467e+09, 1.55082487e+09,
1.55082503e+09, 1.55082520e+09, 1.55082536e+09, 1.55082552e+09,
1.55082569e+09, 1.55082585e+09, 1.55082601e+09, 1.55082618e+09,
1.55082634e+09, 1.55082651e+09, 1.55082670e+09, 1.55082687e+09,
1.55082703e+09, 1.55082720e+09, 1.55082736e+09, 1.55082752e+09,
1.55082769e+09, 1.55082785e+09, 1.55082802e+09, 1.55082818e+09,
1.55082834e+09, 1.55082854e+09, 1.55082870e+09, 1.55082887e+09,
1.55082903e+09, 1.55082920e+09, 1.55082936e+09, 1.55082952e+09,
1.55082969e+09, 1.55082985e+09, 1.55083002e+09, 1.55083018e+09,
1.55083038e+09, 1.55083054e+09, 1.55083070e+09, 1.55083087e+09,
1.55083103e+09, 1.55083120e+09, 1.55083136e+09, 1.55083152e+09,
1.55083169e+09, 1.55083185e+09, 1.55083202e+09, 1.55083222e+09,
1.55083238e+09, 1.55083254e+09, 1.55083271e+09, 1.55083287e+09,
1.55083303e+09, 1.55083320e+09, 1.55083336e+09, 1.55083352e+09,
1.55083369e+09, 1.55083389e+09, 1.55083405e+09, 1.55083422e+09,
1.55083438e+09, 1.55083454e+09, 1.55083471e+09, 1.55083487e+09,
1.55083503e+09, 1.55083520e+09, 1.55083536e+09, 1.55083553e+09,
1.55083572e+09, 1.55083589e+09, 1.55083605e+09, 1.55083622e+09,
1.55083638e+09, 1.55083654e+09, 1.55083671e+09, 1.55083687e+09,
1.55083704e+09, 1.55083720e+09, 1.55083736e+09, 1.55083756e+09,
1.55083772e+09, 1.55083789e+09, 1.55083805e+09, 1.55083822e+09,
1.55083838e+09, 1.55083854e+09, 1.55083871e+09, 1.55083887e+09,
1.55083904e+09, 1.55083920e+09, 1.55083940e+09, 1.55083956e+09,
1.55083972e+09, 1.55083989e+09, 1.55084005e+09, 1.55084022e+09,
1.55084038e+09, 1.55084054e+09, 1.55084071e+09, 1.55084087e+09,
1.55084104e+09, 1.55084124e+09, 1.55084140e+09, 1.55084156e+09,
1.55084173e+09, 1.55084189e+09, 1.55084205e+09, 1.55084222e+09,
1.55084238e+09, 1.55084254e+09, 1.55084271e+09, 1.55084291e+09,
1.55084307e+09, 1.55084324e+09, 1.55084340e+09, 1.55084356e+09,
1.55084373e+09, 1.55084389e+09, 1.55084405e+09, 1.55084422e+09,
1.55084438e+09, 1.55084455e+09, 1.55084474e+09, 1.55084491e+09,
1.55084507e+09, 1.55084524e+09, 1.55084540e+09, 1.55084556e+09,
1.55084573e+09, 1.55084589e+09, 1.55084606e+09, 1.55084622e+09,
1.55084638e+09, 1.55084658e+09, 1.55084674e+09, 1.55084691e+09,
1.55084707e+09, 1.55084724e+09, 1.55084740e+09, 1.55084756e+09,
1.55084773e+09, 1.55084789e+09, 1.55084806e+09, 1.55084822e+09,
1.55084842e+09, 1.55084858e+09, 1.55084874e+09, 1.55084891e+09,
1.55084907e+09, 1.55084924e+09, 1.55084940e+09, 1.55084956e+09,
1.55084973e+09, 1.55084989e+09, 1.55085006e+09, 1.55085026e+09,
1.55085042e+09, 1.55085058e+09, 1.55085075e+09, 1.55085091e+09,
1.55085107e+09, 1.55085124e+09, 1.55085140e+09, 1.55085156e+09,
1.55085173e+09, 1.55085193e+09, 1.55085209e+09, 1.55085226e+09,
1.55085242e+09, 1.55085258e+09, 1.55085275e+09, 1.55085291e+09,
1.55085307e+09, 1.55085324e+09, 1.55085340e+09, 1.55085357e+09,
1.55085376e+09, 1.55085393e+09, 1.55085409e+09, 1.55085426e+09,
1.55085442e+09, 1.55085458e+09, 1.55085475e+09, 1.55085491e+09,
1.55085508e+09, 1.55085524e+09, 1.55085540e+09, 1.55085560e+09,
1.55085576e+09, 1.55085593e+09, 1.55085609e+09, 1.55085626e+09,
1.55085642e+09, 1.55085658e+09, 1.55085675e+09, 1.55085691e+09,
1.55085708e+09, 1.55085724e+09, 1.55085744e+09, 1.55085760e+09,
1.55085776e+09, 1.55085793e+09, 1.55085809e+09, 1.55085826e+09,
1.55085842e+09, 1.55085858e+09, 1.55085875e+09, 1.55085891e+09,
1.55085908e+09, 1.55085928e+09, 1.55085944e+09, 1.55085960e+09,
1.55085977e+09, 1.55085993e+09, 1.55086009e+09, 1.55086026e+09,
1.55086042e+09, 1.55086058e+09, 1.55086075e+09, 1.55086095e+09,
1.55086111e+09, 1.55086128e+09, 1.55086144e+09, 1.55086160e+09,
1.55086177e+09, 1.55086193e+09, 1.55086209e+09, 1.55086226e+09,
1.55086242e+09, 1.55086259e+09, 1.55086278e+09, 1.55086295e+09,
1.55086311e+09, 1.55086328e+09, 1.55086344e+09, 1.55086360e+09,
1.55086377e+09, 1.55086393e+09, 1.55086410e+09, 1.55086426e+09,
1.55086442e+09, 1.55086462e+09, 1.55086478e+09, 1.55086495e+09,
1.55086511e+09, 1.55086528e+09, 1.55086544e+09, 1.55086560e+09,
1.55086577e+09, 1.55086593e+09, 1.55086610e+09, 1.55086626e+09,
1.55086646e+09, 1.55086662e+09, 1.55086678e+09, 1.55086695e+09,
1.55086711e+09, 1.55086728e+09, 1.55086744e+09, 1.55086760e+09,
1.55086777e+09, 1.55086793e+09, 1.55086810e+09, 1.55086830e+09,
1.55086846e+09, 1.55086862e+09, 1.55086879e+09, 1.55086895e+09,
1.55086911e+09, 1.55086928e+09, 1.55086944e+09, 1.55086960e+09,
1.55086977e+09, 1.55086997e+09, 1.55087013e+09, 1.55087030e+09,
1.55087046e+09, 1.55087062e+09, 1.55087079e+09, 1.55087095e+09,
1.55087111e+09, 1.55087128e+09, 1.55087144e+09, 1.55087161e+09,
1.55087180e+09, 1.55087197e+09, 1.55087213e+09, 1.55087230e+09,
1.55087246e+09, 1.55087262e+09, 1.55087279e+09, 1.55087295e+09,
1.55087312e+09, 1.55087328e+09, 1.55087344e+09, 1.55087364e+09,
1.55087380e+09, 1.55087397e+09, 1.55087413e+09, 1.55087430e+09,
1.55087446e+09, 1.55087462e+09, 1.55087479e+09, 1.55087495e+09,
1.55087512e+09, 1.55087528e+09, 1.55087548e+09, 1.55087564e+09,
1.55087580e+09, 1.55087597e+09, 1.55087613e+09, 1.55087630e+09,
1.55087646e+09, 1.55087662e+09, 1.55087679e+09, 1.55087695e+09,
1.55087712e+09, 1.55087732e+09, 1.55087748e+09, 1.55087764e+09,
1.55087781e+09, 1.55087797e+09, 1.55087813e+09, 1.55087830e+09,
1.55087846e+09, 1.55087862e+09, 1.55087879e+09, 1.55087899e+09,
1.55087915e+09, 1.55087932e+09, 1.55087948e+09, 1.55087964e+09,
1.55087981e+09]])
First we create an array with a date range between the first and last entry of times
t = np.arange(np.datetime64(datetime.datetime.fromtimestamp(times[0,0])), np.datetime64(datetime.datetime.fromtimestamp(times[0,-1])), np.timedelta64(30, 'm'))
Output for t
array(['2019-02-22T01:00:10.000000', '2019-02-22T01:30:10.000000',
'2019-02-22T02:00:10.000000', '2019-02-22T02:30:10.000000',
'2019-02-22T03:00:10.000000', '2019-02-22T03:30:10.000000',
'2019-02-22T04:00:10.000000', '2019-02-22T04:30:10.000000',
'2019-02-22T05:00:10.000000', '2019-02-22T05:30:10.000000',
'2019-02-22T06:00:10.000000', '2019-02-22T06:30:10.000000',
'2019-02-22T07:00:10.000000', '2019-02-22T07:30:10.000000',
'2019-02-22T08:00:10.000000', '2019-02-22T08:30:10.000000',
'2019-02-22T09:00:10.000000', '2019-02-22T09:30:10.000000',
'2019-02-22T10:00:10.000000', '2019-02-22T10:30:10.000000',
'2019-02-22T11:00:10.000000', '2019-02-22T11:30:10.000000',
'2019-02-22T12:00:10.000000', '2019-02-22T12:30:10.000000',
'2019-02-22T13:00:10.000000', '2019-02-22T13:30:10.000000',
'2019-02-22T14:00:10.000000', '2019-02-22T14:30:10.000000',
'2019-02-22T15:00:10.000000', '2019-02-22T15:30:10.000000',
'2019-02-22T16:00:10.000000', '2019-02-22T16:30:10.000000',
'2019-02-22T17:00:10.000000', '2019-02-22T17:30:10.000000',
'2019-02-22T18:00:10.000000', '2019-02-22T18:30:10.000000',
'2019-02-22T19:00:10.000000', '2019-02-22T19:30:10.000000',
'2019-02-22T20:00:10.000000', '2019-02-22T20:30:10.000000',
'2019-02-22T21:00:10.000000', '2019-02-22T21:30:10.000000',
'2019-02-22T22:00:10.000000', '2019-02-22T22:30:10.000000',
'2019-02-22T23:00:10.000000', '2019-02-22T23:30:10.000000',
'2019-02-23T00:00:10.000000', '2019-02-23T00:30:10.000000'],
dtype='datetime64[us]')
Now, we want to calculate this back to seconds. To do this, we create a lambda function which does this for a single element of the array and use np.apply_along_axis to perform this operation element-wise on the array.
f = lambda x: (x - np.datetime64('1970-01-01T00:00:00Z'))/np.timedelta64(1,'s')
np.apply_along_axis(f, 0, t)
output
array([1.55079721e+09, 1.55079901e+09, 1.55080081e+09, 1.55080261e+09,
1.55080441e+09, 1.55080621e+09, 1.55080801e+09, 1.55080981e+09,
1.55081161e+09, 1.55081341e+09, 1.55081521e+09, 1.55081701e+09,
1.55081881e+09, 1.55082061e+09, 1.55082241e+09, 1.55082421e+09,
1.55082601e+09, 1.55082781e+09, 1.55082961e+09, 1.55083141e+09,
1.55083321e+09, 1.55083501e+09, 1.55083681e+09, 1.55083861e+09,
1.55084041e+09, 1.55084221e+09, 1.55084401e+09, 1.55084581e+09,
1.55084761e+09, 1.55084941e+09, 1.55085121e+09, 1.55085301e+09,
1.55085481e+09, 1.55085661e+09, 1.55085841e+09, 1.55086021e+09,
1.55086201e+09, 1.55086381e+09, 1.55086561e+09, 1.55086741e+09,
1.55086921e+09, 1.55087101e+09, 1.55087281e+09, 1.55087461e+09,
1.55087641e+09, 1.55087821e+09, 1.55088001e+09, 1.55088181e+09])

Calculate the average of Spearman correlation

I have 2 columns A and B which contain the Spearman's correlation values as follows:
0.127272727 -0.260606061
-0.090909091 -0.224242424
0.345454545 0.745454545
0.478787879 0.660606061
-0.345454545 -0.333333333
0.151515152 -0.127272727
0.478787879 0.660606061
-0.321212121 -0.284848485
0.284848485 0.515151515
0.36969697 -0.139393939
-0.284848485 0.272727273
How can I calculate the average of those correlation values in these 2 columns in Excel or Matlab ? I found a close answer in this link : https://stats.stackexchange.com/questions/8019/averaging-correlation-values
The main point is we can not use mean or average in this case, as explained in the link. They proposed a nice way to do that, but I dont know how to implement it in Excel or Matlab.
Following the second answer of the link you provided, which is the most general case, you can calculate the average Spearman's rho in Matlab as follows:
M = [0.127272727 -0.260606061;
-0.090909091 -0.224242424;
0.345454545 0.745454545;
0.478787879 0.660606061;
-0.345454545 -0.333333333;
0.151515152 -0.127272727;
0.478787879 0.660606061;
-0.321212121 -0.284848485;
0.284848485 0.515151515;
0.36969697 -0.139393939;
-0.284848485 0.272727273];
z = atanh(M);
meanRho = tanh(mean(z));
As you can see it gives mean values of
meanRho =
0.1165 0.1796
whereas the simple mean is quite close:
mean(M)
ans =
0.1085 0.1350
Edit: more information on Fisher's transformation here.
In MATLAB, define a matrix with these values and use mean function as follows:
%define a matrix M
M = [0.127272727 -0.260606061;
-0.090909091 -0.224242424;
0.345454545 0.745454545;
0.478787879 0.660606061;
-0.345454545 -0.333333333;
0.151515152 -0.127272727;
0.478787879 0.660606061;
-0.321212121 -0.284848485;
0.284848485 0.515151515;
0.36969697 -0.139393939;
-0.284848485 0.272727273];
%calculates the mean of each column
meanVals = mean(M);
Result
meanVals =
0.1085 0.1350
It is also possible to calculate the total meanm and the mean of each row as follows:
meanVals = mean(M); %total mean
meanVals = mean(M,2); %mean of each row

data.frame slicing

I hope this question is not too simple for this board.
I have created a data.frame df:
CAS Name CID
89 13010-47-4 Lomustine 3950
90 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
91 130636-43-0 Nifekalant 268083
92 130929-57-6 Entacapone 5281081
and a vector vec
[1] 5282380 18471829 45923789 44308022 44266812 24883465 24867475 24867460
I would like to extract the rows of df which contains any number of vec. I tried to solve this problem by this code:
df$GC[(df$CID %in% vec)] = 1
df[df$GC==1,]
But the problem with this solution is, that I only get the rows, which contain only one number in the CID column. Rows which contain several values in CID like line 90 do not appear.
Is there an elegant solution for this problem?
Thanks in advance
Given your comment on EDi's answer (which I like) I thought I'd make a suggestion.
Squeezing comma separated values into a single column of a data frame is awkward and (in my experience) just leads to frustration. I often find it simpler to keep it in a separate data structure, a list:
dat <- read.table(text = " CAS Name CID
13010-47-4 Lomustine 3950
130209-82-4 Latanoprost 5311221,5282380,46705340,3890
130636-43-0 Nifekalant 268083
130929-57-6 Entacapone 5281081",sep = "",header = TRUE)
cid <- sapply(dat$CID,strsplit,",",USE.NAMES = FALSE)
In this form, things are often easier to work with:
ID <- c(5282380, 18471829, 45923789, 44308022, 44266812, 24883465, 24867475, 24867460, 3950)
dat[sapply(cid,function(x) {any(x %in% as.character(ID))}),]
CAS Name CID
1 13010-47-4 Lomustine 3950
2 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
You can always use rownames in dat and the names of the list to keep each item straight, if you're worried about orderings changing.
(Also note that my anonymous function is assuming that ID will be found eventually by R's scoping rules; you can alter the function to pass in ID explicitly if you like.)
One way is to use grep():
> txt <- " CAS Name CID
+ 13010-47-4 Lomustine 3950
+ 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
+ 130636-43-0 Nifekalant 268083
+ 130929-57-6 Entacapone 5281081
+ "
> con <- textConnection(txt)
> df <- read.table(con, header = TRUE)
> close(con)
> ID <- c(5282380, 18471829, 45923789, 44308022, 44266812, 24883465, 24867475, 24867460, 3950)
> grep(paste("\\b", ID, "\\b", sep="", collapse = "|"), dat$CID)
[1] 1 2

Resources