I have a data set that spans a certain length of time and data points for each of these time points. I would like to create a much more detailed timescale and fill the empty data points to zero. I wrote a piece of code to do this but it isn't doing what I want it to. I tried a sample case though and it seems to work. Below are the two codes.
This piece of code does not do what I want it to.
import numpy as np
TD_t = np.array([36000, 36500, 37000, 37500, 38000, 38500, 39000, 39500, 40000, 40500, 41000, 41500, 42000, 42500,
43000, 43500, 44000, 44500, 45000, 45500, 46000, 46500, 47000, 47500, 48000, 48500, 49000, 49500,
50000, 50500, 51000, 51500, 52000, 52500, 53000, 53500, 54000, 54500, 55000, 55500, 56000, 56500,
57000, 57500, 58000, 58500, 59000, 59500, 60000, 60500, 61000, 61500, 62000, 62500, 63000, 63500,
64000, 64500, 65000, 65500, 66000])
TD_d = np.array([-0.05466527, -0.04238242, -0.04477601, -0.02453717, -0.01662798, -0.02548617, -0.02339215,
-0.01186576, -0.0029057 , -0.01094671, -0.0095005 , -0.0190277 , -0.01215644, -0.01997112,
-0.01384497, -0.01610656, -0.01927564, -0.02119056, -0.011634 , -0.00544096, -0.00046568,
-0.0017769 , -0.0007341, 0.00193066, 0.01359107, 0.02054919, 0.01420335, 0.01550565,
0.0132394 , 0.01371563, 0.01959774, 0.0165316 , 0.01881992, 0.01554435, 0.01409003,
0.01898334, 0.02300266, 0.03045158, 0.02869013, 0.0238423 , 0.02902356, 0.02568908,
0.02954539, 0.02537967, 0.02927247, 0.02138605, 0.02815635, 0.02733237, 0.03321588,
0.03063803, 0.03783137, 0.04110955, 0.0451221 , 0.04646263, 0.04472884, 0.04935833,
0.03372911, 0.04031406, 0.04165237, 0.03940343, 0.03805504])
time = np.arange(0, 100001,1)
data = np.zeros_like(time)
for i in range(0, len(TD_t)):
t = TD_t[i]
data[t] = TD_d[i]
print(i,t,TD_d[i],data[t])
But for some reason this code works.
import numpy
nums = numpy.array([0,1,2,3])
data = numpy.zeros_like(nums)
data[0] = nums[2]
data[0], nums[2]
Any help will be much appreciated!!
It's because the dtype of data is being set to int64, and so when you try to reassign one of the data elements, it gets rounded to zero.
Try changing the line to:
data = np.zeros_like(time, dtype=float)
and it should work (or use whatever dtype the TD_d array is)
Related
I wanted to write a deep nested list (within list within list) to csv, but it always collapse my list and printed with ..., which im unable to retrieve the hidden values.
List is to store frames of videos, the list goes up to 5 layer ()-len of each layer, Video (no of videos) > 8 frames(8) > width(200) > height(200) > pixel of 3 channel (3)
I tried converting the list to data frame before writing it to csv but still unable to solve this problem.
"[array([[[0.23137255, 0.26666668, 0.27058825],
[0.23921569, 0.27450982, 0.2784314 ],
[0.23529412, 0.27058825, 0.27450982],
...,
[0.25882354, 0.29411766, 0.2901961 ],
[0.25490198, 0.2901961 , 0.28627452],
[0.25490198, 0.2901961 , 0.28627452]],
[[0.20392157, 0.23921569, 0.24313726],
[0.21568628, 0.2509804 , 0.25490198],
[0.21568628, 0.2509804 , 0.25490198],
...,
[0.26666668, 0.3019608 , 0.29803923],
[0.26666668, 0.3019608 , 0.29803923],
[0.2627451 , 0.29803923, 0.29411766]],
[[0.1882353 , 0.22352941, 0.22745098],
[0.2 , 0.23529412, 0.23921569],
[0.20392157, 0.23921569, 0.24313726],
...,
[0.27450982, 0.30980393, 0.30588236],
[0.27058825, 0.30588236, 0.3019608 ],
[0.27058825, 0.30588236, 0.3019608 ]],
...,
I'd try one of the following:
dump the whole object into json:
import json
with open('my_saved_file.json', 'w+') as out_file:
out_file.write(list_of_lists_of_lists, indent=2)
What I'd try is storing all of your images as images and reference them in an index (could be csv)
import numpy as np
from PIL import Image
with open('reference.csv', 'w+') as out_csv:
out_csv.write("video, frame_set, frame1, frame2, frame3, frame4, frame5, frame6, frame7, frame8\n")
for video_no, video in enumerate(list_of_lists_of_lists):
row = [video_no]
for frame_set_no, frames in enumerate(video):
for frame_no, frame in enumerate(frames):
im = Image.fromarray(frame)
frame_name = f"{video_no}-{frame_set_no}-{frame_no}.jpeg"
row.append(frame_name)
im.save(frame_name)
out_csv.write(",".join(row) + "\n")
I want to append several arrays but with different size. However I don't want to merge them together, just stock them in a mega-list. Here a simplified code of mine which try to reproduce my problem:
import numpy as np
total_wavel = 5
tot_values = []
for i in range(total_wavel):
size = int(np.random.uniform(low=2, high=7))
values = np.array(np.random.uniform(low=1, high=6, size=(size,)))
tot_values = np.append(tot_values,values)
Exemple Output :
array([4.88776545, 4.86006097, 1.80835575, 3.52393214, 2.88971373,
1.62978552, 4.06880898, 4.10556672, 1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801, 1.7403673 ,
4.90459377, 3.44198867, 5.03055533, 3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Expected Output :
np.array([np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214)], np.array([2.88971373,
1.62978552, 4.06880898, 4.10556672]), np.array([1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801]), np.array([1.7403673 ,
4.90459377, 3.44198867, 5.03055533], np.array([3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])])
Or
np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214], [2.88971373,
1.62978552, 4.06880898, 4.10556672],[1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801], [1.7403673 ,
4.90459377, 3.44198867, 5.03055533], [3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Thank you in advance
In for loop tot_values.append(list(values)), and after loop tot_np=np.array(tot_values)
To solve a 5 parameter model, I need at least 5 data points to get a unique solution. For x and y data below:
import numpy as np
x = np.array([[-0.24155831, 0.37083184, -1.69002708, 1.4578805 , 0.91790011,
0.31648635, -0.15957368],
[-0.37541846, -0.14572825, -2.19695883, 1.01136142, 0.57288752,
0.32080956, -0.82986857],
[ 0.33815532, 3.1123936 , -0.29317028, 3.01493602, 1.64978158,
0.56301755, 1.3958912 ],
[ 0.84486735, 4.74567324, 0.7982888 , 3.56604097, 1.47633894,
1.38743513, 3.0679506 ],
[-0.2752026 , 2.9110031 , 0.19218081, 2.0691105 , 0.49240373,
1.63213241, 2.4235483 ],
[ 0.89942508, 5.09052174, 1.26048572, 3.73477373, 1.4302902 ,
1.91907482, 3.70126468]])
y = np.array([-0.81388378, -1.59719762, -0.08256274, 0.61297275, 0.99359647,
1.11315445])
I used only 6 data to fit a 8 parameter model (7 slopes and 1 intercept).
lr = LinearRegression().fit(x, y)
print(lr.coef_)
array([-0.83916772, -0.57249998, 0.73025938, -0.02065629, 0.47637768,
-0.36962192, 0.99128474])
print(lr.intercept_)
0.2978781587718828
Clearly, it's using some kind of assignment to reduce the degrees of freedom. I tried to look into the source code but couldn't found anything about that. What method do they use to find the parameter of under specified model?
You don't need to reduce the degrees of freedom, it simply finds a solution to the least squares problem min sum_i (dot(beta,x_i)+beta_0-y_i)**2. For example, in the non-sparse case it uses the linalg.lstsq module from scipy. The default solver for this optimization problem is the gelsd LAPACK driver. If
A= np.concatenate((ones_v, X), axis=1)
is the augmented array with ones as its first column, then your solution is given by
x=numpy.linalg.pinv(A.T*A)*A.T*y
Where we use the pseudoinverse precisely because the matrix may not be of full rank. Of course, the solver doesn't actually use this formula but uses singular value Decomposition of A to reduce this formula.
I would like to know if there is a way to record the value of a specific variable within the function of integration, without having to print it within the definition of the function, which in many cases, due to the algorithm of prediction-correction, lead to more or less values than the final vector returned by the function?
Example let's try with this code:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def essai(y, t):
a = y[0]
c1 = a
a = c1 / a**2
return [a]
# Solving
essai0 = [10]
t = np.linspace(0, 2000, 10)
y = odeint(essai, essai0, t)
a = y[:, 0]
# Graphs
fig, ax = plt.subplots()
ax.plot(t, a, 'k--', label='a')
legend = ax.legend(loc='lower right', shadow=True, fontsize='x-large')
legend.get_frame().set_facecolor('#FFFCCC') # 00FFCC
plt.xlabel('x')
plt.ylabel('y')
plt.title('y vs x')
plt.show()
I would like to record the values of c1 which depends on a. What should I do?
If I print, it I get (because of pred-corr algorithm):
10.0
10.001203411814794
10.00120326701222
10.002406534059283
10.00240638930896
10.031168251789499
10.03116843523562
10.059847893733858
10.059848247411573
10.088446178306066
10.088446526968276
10.178981333917179
10.1789826635142
10.26872274187664
10.268720875457465
10.251795853148066
10.251794757670828
10.324093402400061
10.324093338929458
10.395889284010963
10.395889126663482
10.467192620394076
10.467192470562162
10.60836217080531
10.608361512785885
10.747675991273601
10.747676529983982
10.885208084361661
10.88520861500753
11.021024408838219
11.021024559158226
11.15518691385528
11.15518704871583
11.389028983440005
11.389029612664437
11.618166387462095
11.618166372845774
11.842871925632974
11.842870666797078
12.063390475531826
12.0633901508557
12.279950446401756
12.279950250452782
12.492757035192547
12.492756877414479
12.790475076345272
12.79047467718475
13.081418818481728
13.081418595295522
13.366029970579808
13.366030900758636
13.644707388512776
13.644707798536366
13.917805722870085
13.917805853240296
14.185647189512732
14.185647276304193
14.448524340486092
14.44852440612534
14.849045554474056
14.849045812160185
15.239043242348172
15.239044113472564
15.619306858637934
15.619307570817467
15.990530200625596
15.990530706701604
16.353328829257094
16.35332918566708
16.70825155213741
16.708251810028536
17.055790075751844
17.055790265472186
17.52054793291328
17.520548366986496
17.97329155702487
17.97329263337524
18.414908470097206
18.41490919183692
18.84617978510828
18.846180323693773
19.26780035288661
19.26780072790131
19.68039039537204
19.680390669145883
20.084506483562638
20.084506685872917
20.63204921728682
20.632049705019547
21.165431430483114
21.16543268212929
21.685699626883885
21.685700483180575
22.193774842932424
22.193775478119036
22.69047628806277
22.69047673120133
23.176535191516802
23.1765355148269
23.652607704971896
23.652607943862492
24.296731084127696
24.296731656936466
24.92421316694978
24.924214631653445
25.536282592848192
25.536283593100098
26.134020839947766
26.134021582629195
26.718389929663125
26.718390447872228
27.290248649274574
27.290249027491374
27.8503676838429
27.85036796338048
28.60821935477876
28.608220025227006
29.346505899333515
29.346507613905608
30.066670806260635
30.066671977520553
30.769984796557875
30.769985666417984
31.457578314647648
31.457578921761066
32.13046057231114
32.13046101551341
32.78953730742519
32.789537635058444
33.68118868621462
33.68118947182226
34.54983545122736
34.549837459883506
35.39717380841791
35.397175180698845
36.22469707822626
36.224698097642104
37.033733817898586
37.03373452954837
37.82547018189015
37.82547070150822
38.60097077071101
38.60097115490064
39.65004988104156
39.650050802111195
40.67207751401193
40.67207986867377
41.669047220267416
41.66904882908885
42.64271422854618
42.64271542393563
43.594640193459966
43.59464102811222
44.52621945824691
44.52622006777859
45.43870353935591
45.438703990091476
46.67300975177773
46.673010832232926
47.87550305124021
47.87550581301012
49.04852683447106
49.04852872160157
50.194144483083306
50.19414588551954
51.3141919066777
51.31419288605143
52.41030839692969
52.41030911225109
53.48396538985435
53.483965918885744
54.9362075454971
54.93620881348237
56.35103457439806
56.35103781516747
57.73120149400896
57.731203708595864
59.07913425147381
59.07913589751868
60.39699143853227
60.39699258818307
61.68670054765226
61.68670138744394
62.94999176730058
62.949992388453296
64.65865496068966
64.65865644932029
Which is much more values than I may expect with t = np.linspace(0, 2000, 10) which divide the intervale of time in tenth of 200.
I have thought to this problem for a long time without find a really good way to do it and I would be delighted to know how to bypass this problem.
There is no relation between the evaluation points of the ODE function in the internal solver steps and the requested sample points of the solution for the output. Moreover, the evaluation points can deviate from the solution trajectory with some error of an order lower than the order of the integration method.
The easiest way to do what you want in a structured fashion is to define the c1 function as a separate function and then to call it on the results
def c1_func(y): return y[0]
def essai(y, t):
a = y[0]
c1 = c1_func(y)
a = c1 / a**2
return [a]
...
y = odeint(...
c1_val = c1_func(y.T)
plt.plot(x, c1_val)
or so.
Correct me if I'm wrong: the "thresholds" returned by scikit-learn's roc_curve should be an array of numbers that are in [0,1]. However, it sometimes gives me an array with the first number close to "2". Is it a bug or I did sth wrong? Thanks.
In [1]: import numpy as np
In [2]: from sklearn.metrics import roc_curve
In [3]: np.random.seed(11)
In [4]: aa = np.random.choice([True, False],100)
In [5]: bb = np.random.uniform(0,1,100)
In [6]: fpr,tpr,thresholds = roc_curve(aa,bb)
In [7]: thresholds
Out[7]:
array([ 1.97396826, 0.97396826, 0.9711752 , 0.95996265, 0.95744405,
0.94983331, 0.93290463, 0.93241372, 0.93214862, 0.93076592,
0.92960511, 0.92245024, 0.91179548, 0.91112166, 0.87529458,
0.84493853, 0.84068543, 0.83303741, 0.82565223, 0.81096657,
0.80656679, 0.79387241, 0.77054807, 0.76763223, 0.7644911 ,
0.75964947, 0.73995152, 0.73825262, 0.73466772, 0.73421299,
0.73282534, 0.72391126, 0.71296292, 0.70930102, 0.70116428,
0.69606617, 0.65869235, 0.65670881, 0.65261474, 0.6487222 ,
0.64805644, 0.64221486, 0.62699782, 0.62522484, 0.62283401,
0.61601839, 0.611632 , 0.59548669, 0.57555854, 0.56828967,
0.55652111, 0.55063947, 0.53885029, 0.53369398, 0.52157349,
0.51900774, 0.50547317, 0.49749635, 0.493913 , 0.46154029,
0.45275916, 0.44777116, 0.43822067, 0.43795921, 0.43624093,
0.42039077, 0.41866343, 0.41550367, 0.40032843, 0.36761763,
0.36642721, 0.36567017, 0.36148354, 0.35843793, 0.34371331,
0.33436415, 0.33408289, 0.33387442, 0.31887024, 0.31818719,
0.31367915, 0.30216469, 0.30097917, 0.29995201, 0.28604467,
0.26930354, 0.2383461 , 0.22803687, 0.21800338, 0.19301808,
0.16902881, 0.1688173 , 0.14491946, 0.13648451, 0.12704826,
0.09141459, 0.08569481, 0.07500199, 0.06288762, 0.02073298,
0.01934336])
Most of the time these thresholds are not used, for example in calculating the area under the curve, or plotting the False Positive Rate against the True Positive Rate.
Yet to plot what looks like a reasonable curve, one needs to have a threshold that incorporates 0 data points. Since Scikit-Learn's ROC curve function need not have normalised probabilities for thresholds (any score is fine), setting this point's threshold to 1 isn't sufficient; setting it to inf is sensible but coders often expect finite data (and it's possible the implementation also works for integer thresholds). Instead the implementation uses max(score) + epsilon where epsilon = 1. This may be cosmetically deficient, but you haven't given any reason why it's a problem!
From the documentation:
thresholds : array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute
fpr and tpr. thresholds[0] represents no instances being predicted
and is arbitrarily set to max(y_score) + 1.
So the first element of thresholds is close to 2 because it is max(y_score) + 1, in your case thresholds[1] + 1.
this seems like a bug to me - in roc_curve(aa,bb), 1 is added to the first threshold. You should create an issue here https://github.com/scikit-learn/scikit-learn/issues