why is my second to last number not visible in output? - python-3.x

When running
>>> a = np.linspace(0, 330, 330, 1, dtype=int)
>>> print(a)
[ 0, 1, 2, ..[skipped for readability].. 323, 324, 325, 326, 327, 328, 330])], dtype=int
I expect the second last number to be 329 instead of 328. Why is this not the case? It's probably because that number in a float will be 328.99696049 but I do wonder how I can include it into my output, and if it does matter for my data purity when I do calculations on that number.

Yes, your assumption is correct. np.linspace distributes 330 values between 0 and 330 which means the stepsize between two neighboring values is (end - start) / (steps - 1) = 330 / 329. Since you coerce to int, the decimal part is truncated.
If you would like a stepsize of 1 continously, you need 331 steps:
a = np.linspace(0, 330, 331, 1, dtype=int)
Of course it's even simpler to get the same result using np.arange:
a = np.arange(331)

Related

cv2.groupRectangles() returning empty tuple - Python

I'm trying to group rectangles together using cv2.groupRectangles() but it returns empty tuple ()
I tried using 2 different versions of opencv-python (4.6.0.66 and 3.4.18.65)
‎This is the code:
# "do" object detection
rectangles = np.array([
[530, 119, 47, 47],
[641, 117, 53, 53],
[531, 89, 117, 117]])
print(rectangles)
print()
(rect, weights) = cv2.groupRectangles(rectangles, groupThreshold=1, eps=0.5)
print("rect after cv.groupRectangles:", rect)
And this is the output:
[[530 119 47 47]
[641 117 53 53]
[531 89 117 117]]
rect after cv.groupRectangles: ()
Here are your initial rectangles for reference. How were you wanting those to be merged?
You might have chosen groupThreshold badly.
If that is more than 0, the function will remove any rectangles that aren't "confirmed" by nearby neighbors. With 0, it's allowed to leave single rectangles alone...
>>> cv.groupRectangles(rectangles, groupThreshold=0, eps=0)
(array([[530, 119, 47, 47],
[641, 117, 53, 53],
[531, 89, 117, 117]]), array([1, 1, 1]))
But then it also doesn't seem to want to merge any of them. That eps doesn't seem to behave all that "relatively". I get effects when I increase it beyond 1.0.
Documentation is very poor about this. Perhaps this function is broken. If not broken, then it's at least not documented well enough for me to make sense of it and I think I'm proficient with OpenCV.

Sorting dictionary by key

I have a dictionary that have year-month combination as the key and value of it. I used OrderedDict to sort the dictionary and getting result like below. In my expected result, after "2021-1", it should be "2021-2". But "2021-10" is coming in between.
{
"2020-11": 25,
"2020-12": 861,
"2021-1": 935,
"2021-10": 1,
"2021-2": 4878,
"2021-3": 6058,
"2021-4": 3380,
"2021-5": 4017,
"2021-6": 1163,
"2021-7": 620,
"2021-8": 300,
"2021-9": 7
}
My expected result should be like below. I want the dictionary to be sorted by least date to the last date
{
"2020-11": 25,
"2020-12": 861,
"2021-1": 935,
"2021-2": 4878,
"2021-3": 6058,
"2021-4": 3380,
"2021-5": 4017,
"2021-6": 1163,
"2021-7": 620,
"2021-8": 300,
"2021-9": 7,
"2021-10": 1
}
Appreciate if you can help.
If you want to customize the way sorting is done, use sorted with parameter key:
from typing import OrderedDict
from decimal import Decimal
data = {
"2020-11": 25,
"2020-12": 861,
"2021-1": 935,
"2021-10": 1,
"2021-2": 4878,
"2021-3": 6058,
"2021-4": 3380,
"2021-5": 4017,
"2021-6": 1163,
"2021-7": 620,
"2021-8": 300,
"2021-9": 7
}
def year_plus_month(item):
key = item[0].replace("-", ".")
return Decimal(key)
data_ordered = OrderedDict(sorted(data.items(), key=year_plus_month))
print(data_ordered)
I used Decimal instead of float to avoid any wonky floating point precision.

OverflowError when subclassing rv_continuous

import scipy.stats as st
import numpy as np # generic math functions
# https://scicomp.stackexchange.com/q/1658
class LorentzGen(st.rv_continuous):
"""Lorentz distribution"""
def _pdf(self, x):
gamma = 0.27
return 2 * gamma / (np.pi * (gamma ** 2 + x ** 2))
transverse_fields = LorentzGen(a=0)
gaussian_gen = st.norm()
L = 2
list_of_temps = np.linspace(1, 10, 40)
for T, temp in enumerate(list_of_temps):
print(f"Run {T}")
for t in range(5000):
if t%500==0:
print(f"Trial {t}")
h_x = [[-transverse_fields.rvs(), xx] for xx in range(L)] # OverflowError: (34, 'Result too large')
# h_y = [[-gaussian_gen.rvs(), xx] for xx in range(L)] # Works
In the above code, I have implemented my own probability distribution (essentially a half Lorentzian, x∈[0,∞]), modified from the answer from scicomp.SE, which I call transverse_fields.
I need to generate a whole bunch of values from this transverse_fields, using them in a nested For loop. The issue is that beyond a certain number of runs, here "Run 1 Trial ~3500", I get a bunch of errors:
C:\ProgramData\Anaconda3\lib\site-packages\scipy\integrate\quadpack.py:385: IntegrationWarning: The integral is probably divergent, or slowly convergent.
warnings.warn(msg, IntegrationWarning)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py:2831: RuntimeWarning: overflow encountered in ? (vectorized)
outputs = ufunc(*inputs)
Traceback (most recent call last):
File "C:/<redacted>/stackoverflow.py", line 26, in <module>
h_x = [[-transverse_fields.rvs(), xx] for xx in range(L)] # Result Overflow
File "C:/<redacted>/stackoverflow.py", line 26, in <listcomp>
h_x = [[-transverse_fields.rvs(), xx] for xx in range(L)] # Result Overflow
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 954, in rvs
vals = self._rvs(*args)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 889, in _rvs
Y = self._ppf(U, *args)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 902, in _ppf
return self._ppfvec(q, *args)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2755, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2831, in _vectorize_call
outputs = ufunc(*inputs)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1587, in _ppf_single
while self._ppf_to_solve(right, q, *args) < 0.:
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1569, in _ppf_to_solve
return self.cdf(*(x, )+args)-q
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1745, in cdf
place(output, cond, self._cdf(*goodargs))
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1621, in _cdf
return self._cdfvec(x, *args)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2755, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2831, in _vectorize_call
outputs = ufunc(*inputs)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1618, in _cdf_single
return integrate.quad(self._pdf, self.a, x, args=args)[0]
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\integrate\quadpack.py", line 341, in quad
points)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\integrate\quadpack.py", line 448, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
File "C:/<redacted>/stackoverflow.py", line 11, in _pdf
return 2 * gamma / (np.pi * (gamma ** 2 + x ** 2))
OverflowError: (34, 'Result too large')
Process finished with exit code 1
Note that the error does not occur if I bump the number of trials, here t to a smaller number like 50, nor does it occur if list_of_temps has less values, e.g. np.linspace(1,10,4). Even though in the original problem, with np.linspace(1,10,40), the error popped up during Run 1.
With the original setup, there is also no overflow error when I use the standard Gaussian distribution function from scipy.stats.
Similar issues on SO that I've seen attribute this to the range over which the for loop is run is too big. But I don't quite see that here? And I don't quite understand how to implement Decimal as suggested in the answer in the linked question, in any case.
How can I fix this?
I'm running Python 3.6.5 with Anaconda on 64bit Windows 10.
It seems to be an issue with the way I'm declaring my probability distribution with transverse_fields/LorentzGen.
My solution was to use the in-built cauchy distribution in scipy.stats, with a modified scale.
Also since I wanted a half-Lorentzian, I just took the absolute np.abs(...) when drawing a random number from transverse_fields.
import scipy.stats as st
import numpy as np # generic math functions
transverse_fields = st.cauchy(scale=0.27)
L = 2
list_of_temps = np.linspace(1, 10, 40)
for T, temp in enumerate(list_of_temps):
print(f"Run {T}")
for t in range(5000):
if t%500==0:
print(f"Trial {t}")
h_x = [[-np.abs(transverse_fields.rvs()), xx] for xx in range(L)] # Now works
This is satisfactory enough for me now, but I would still appreciate someone explaining why my way of subclassing rv_continuous gave me the aforementioned errors.

ValueError in scipy t test_ind

I have following csv file:
SRA ID ERR169499 ERR169498 ERR169497
Label 1 0 1
TaxID PRJEB3251_ERR169499 PRJEB3251_ERR169499 PRJEB3251_ERR169499
333046 0.05 0.99 99.61
1049 0.03 2.34 34.33
337090 0.01 9.78 23.22
99007 22.33 2.90 0.00
I have 92 columns for case for which label is 0 and 95 columns for control for which label is 1. I have to perform two sample independent T-Test and ranksum test So far I have:
df = pd.read_csv('final_out_transposed.csv', header=[1,2], index_col=[0])
case = df.xs('0', axis=1, level=0).dropna()
ctrl = df.xs('1', axis=1, level=0).dropna()
(tt_val, p_ttest) = ttest_ind(case, ctrl, equal_var=False)
For which I am getting the error: ValueError: operands could not be broadcast together with shapes (92,) (95,).
The traceback is:
File "<ipython-input-152-d58634e75106>", line 1, in <module>
runfile('C:/IBD Bioproject/New folder/temp_3251.py', wdir='C:/IBD
Bioproject/New folder')
File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/IBD Bioproject/New folder/temp_3251.py", line 106, in <module>
tt_val, p_ttest = ttest_ind(case, ctrl, equal_var=False)
File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\scipy\stats\stats.py", line 4068, in ttest_ind
df, denom = _unequal_var_ttest_denom(v1, n1, v2, n2)
File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\scipy\stats\stats.py", line 3872, in _unequal_var_ttest_denom
df = (vn1 + vn2)**2 / (vn1**2 / (n1 - 1) + vn2**2 / (n2 - 1))
ValueError: operands could not be broadcast together with shapes (92,) (95,)
I read few posts but its still unclear also I went through numpy broadcast.
Thanks in advance
Apparently the objects created by the xs method of the Pandas DataFrame look like two-dimensional arrays. These must be flattened to look like one-dimensional arrays when passed to ttest_ind.
Try this:
ttest_ind(case.values.ravel(), ctrl.values.ravel(), equal_var=False)
The values attribute of the Pandas objects gives a numpy array, and the ravel() method flattens the array to one-dimension.

Python Deap GP Evaluating individual causes error

I am currently experiencing an issue whenever I try to evaluate an individual using the GP portion of DEAP.
I receive the following error:
Traceback (most recent call last):
File "ImageGP.py", line 297, in <module>
pop, logs = algorithms.eaSimple(pop, toolbox, 0.9, 0.1, 60, stats=mstats, halloffame=hof, verbose=True)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/deap/algorithms.py", line 148, in eaSimple
for ind, fit in zip(invalid_ind, fitnesses):
File "ImageGP.py", line 229, in evalFunc
func = toolbox.compile(expr=individual)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/deap/gp.py", line 451, in compile
return eval(code, pset.context, {})
File "<string>", line 1
lambda oValue,oAvg13,oAvg17,oAvg21,sobelVal(v),sobelVal(h),edgeVal,blotchVal: [[[0, 75, 82.2857142857, 83.0, 82.9090909091, 4, 12, 4, 180], ... Proceed to print out all of my data ... [0, 147, 151.244897959, 150.728395062, 150.73553719, 248, 244, 5, 210]]]
^
SyntaxError: invalid syntax
If anyone has any ideas about what could be causing this problem, then I would really appreciate some advice. My current evaluation function looks like this:
def evalFunc(individual, data, points):
func = toolbox.compile(expr=individual)
total = 1.0
for point in points:
tmp = [float(x) for x in data[point[1]][point[0]][1:9]]
total += int((0 if (func(*tmp)) < 0 else 1) == points[2])
print ("Fitness: " + str(total))
return total,
Where the data contains the data being used (the values for the 8 variables listed in the error) and point specifying the x and y co-ordinates from which to get those 8 values. Thank you for your suggestions!

Resources