I have been trying to make a function that takes a series of numbers and outputs its cumulative maximum with a decay factor.
So for example, if I have this:
my_series = np.array([5, 3.6, 4.1, 2.3, 1.7, 4.9, 3.6, 6.4, 4.5])
decay_factor = 0.991
The desired output of my function would be the following:
[5. 4.955 4.910405 4.86621135 4.82241545 4.9
4.8559 6.4 6.3424 ]
As you can see, every new element must be the greatest between the next element from my original series and the previous element from the output series multiplied by the decay factor.
I would love to be able to make this function without using any for loops, so that I can speed it up.
When I run the code below, I got the result: Quantiles segments =WrappedArray(-27.0, 2.0, 4443.0), which shows the median is 2.0
val quantiles = dfQuestions
.stat
.approxQuantile("score",Array(0,0.5,1.0),0.25)
println(s"Quantiles segments =${quantiles.toSeq}")
Quantiles segments =WrappedArray(-27.0, 2.0, 4443.0)
When I used the percentile_approx(score, 0.25), I got the same result. Can anyone tell me why is 0.25 used in here, not 0.5
dfQuestions.createOrReplaceTempView("so_questions")
sparkSession.sql("select min(score), percentile_approx(score, 0.25), max(score) from so_questions").show()
So first, I am getting an error when I try code resembling yours:
NameError: name 'Array' is not defined
Replacing your Array() with brackets [] works although I removed your first argument of 0 because that yielded:
Py4JJavaError: An error occurred while calling o257.approxQuantile.
: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double
This is odd because the Apache Spark webpage for pyspark.sql.DataFrame.approxQuantile() indicates 0 for the probabilities argument captures the minimum. Perhaps this is a versioning issue.
Anyhow, this worked:
dfQuestions.stat.approxQuantile("score", [0.5,1.0], 0.25)
Nevertheless, assuming both approxQuantile() and percentile_approx() are operating as expected then it is possible that the 0.25 percentile and 0.5 percentile (median) are the same. For example, they are equivalent in this list of 12 values:
0, 0, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4
The 0.25 percentile is the fourth value = 2 (1/3 of values below 2) and the 0.5 percentile is between the sixth (2) and seventh (2) values, which is 2 because they are equivalent.
Lastly, I allow that the approximations may not be working as expected. I have experienced more accurate results with percentile_approx(), even with a relativeError argument of 0 (exact calculation) instead of your 0.25 for approxQuantile(). The inaccuracy of an "exact calculation" does not make sense. I may be making an unknown mistake somewhere.
I use percentile_approx() in a SQL line a la:
score_quantile = sqlContext.sql("select percentile_approx(score, 0.25) as \
approx25Quantile from dfQuestions")
score_quantile.show()
Suppose if the value is >.5 it will return 1 but if it's<.5 it'll return 0. Is there any built in method in python to do this?I want to use this in machine learning. If possible please give me a sample
numpy has a round function that will round to nearest floating point, which you could use. Or just make your own. Something like this should work for numbers between [0.0, 1.0] and return integers:
>>> def rnd(x):
... return int(x + 0.5)
...
>>> rnd(0.4)
0
>>> rnd(0.6)
1
>>> rnd(0.5)
1
You can use the round() function in Python.
print(round(31.5)) gives 32 and
print(round(31.4)) gives 31.
Does this answer your query?
I will try to be as specific as possible and also express myself in a manner that is acceptable. I hope will not offend anyone by phrasing my question in a bad way. I am fairly new to Python and I hope to get my mind elevated by experts.
So to the problem..:
I am currently taking baby steps towards creating a board game. I have come to the follow point (be prepared to see my code):
def createGrid(rows, cols):
grid = [[0 for i in range(cols)] for j in range(rows)]
print(grid)
createGrid(3,4)
and this will output
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
Which is exactly what I want. A 2D-array. This is supposed to be the grid of my board game.
The next step for me, is to make every one of the internal lists to a row. (I know that they already are that) but they are currently all smeared on a line. And I am wondering: how would one go by to arrange the internal lists so that it looks like a legitimate matrix?
So like a rectangle with the internal lists stacked onto each other.
Thanks in advance! :)
The simplest approach is to pretty print it:
import pprint
pprint.pprint(grid)
Or you might choose to render it yourself:
for row in grid:
print('>>', row, '<<')
Or:
for row in grid:
for elt in row:
print(f'({elt})', end=' ')
print('.\n')
Or consider jumping into numpy.
Is there a way of using the range() function with stride -1?
E.g. using range(10, -10) instead of the square-bracketed values below?
I.e the following line:
for y in range(10,-10)
Instead of
for y in [10,9,8,7,6,5,4,3,2,1,0,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10]:
Obviously one could do this with another kind of loop more elegantly but the range() example would work much better for what I want.
You can specify the stride (including a negative stride) as the third argument, so
range(10,-11,-1)
gives
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10]
In general, it doesn't cost anything to try. You can simply type this into the interpreter and see what it does.
This is all documented here as:
range(start, stop[, step])
but mostly I'd like to encourage you to play around and see what happens. As you can see, your intuition was spot on.
Yes, by defining a step:
for i in range(10, -11, -1):
print(i)
In addition to the other good answers, there is an alternative:
for y in reversed(range(-10, 11)):
See the documentation for reversed().
You may notice that the range function works only in ascending order without the third parameter. If you use without the third parameter in the range block, it will not work.
for i in range(10,-10)
The above loop will not work.
For the above loop to work, you have to use the third parameter as negative number.
for i in range(10,-10,-1)
Yes, however you'll need to specify that you want to step backwards by setting the step argument to -1.
Use:
for y in range(10, -10, -1)
For your case using range(10,-10,-1)
will be helpful. The first argument refers to the first step, the second one refers to the last step, and the third argument refers to the size of that step.
When your range is ascending, you do not need to specify the steps if you need all numbers between, range(-10,10) or range(-10,-5).
But when your range is descending, you need to specify the step size as -1, range(10,-10,-1) or any other larger steps.
If you prefer create list in range:
numbers = list(range(-10, 10))
To summarize, these 3 are the best efficient and relevant to answer approaches I believe:
first = list(x for x in range(10, -11, -1))
second = list(range(-10, 11))
third = [x for x in reversed(range(-10, 11))]
Alternatively, NumPy would be more efficient as it creates an array as below, which is much faster than creating and writing items to the list in python. You can then convert it to the list:
import numpy as np
first = -(np.arange(10, -11, -1))
Notice the negation sign for first.
second = np.arange(-10, 11)
Convert it to the list as follow or use it as numpy.ndarray type.
to_the_list = first.tolist()
#Treversed list in reverse direction
l1=[2,4,3]
for i in range (len(l1)-1,-1,-1):
print (l1[i])