mapPartitions returns empty array - apache-spark

I have the following RDD which has 4 partitions:-
val rdd=sc.parallelize(1 to 20,4)
Now I try to call mapPartitions on this:-
scala> rdd.mapPartitions(x=> { println(x.size); x }).collect
5
5
5
5
res98: Array[Int] = Array()
Why does it return empty array? The anonymoys function is simply returning the same iterator it received, then how is it returning empty array? The interesting part is that if I remove println statement, it indeed returns non empty array:-
scala> rdd.mapPartitions(x=> { x }).collect
res101: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
This I don't understand. How come the presence of println (which is simply printing size of iterator) affecting the final outcome of the function?

That's because x is a TraversableOnce, which means that you traversed it by calling size and then returned it back....empty.
You could work around it a number of ways, but here is one:
rdd.mapPartitions(x=> {
val list = x.toList;
println(list.size);
list.toIterator
}).collect

To understand what is going on we have to take a look at the signature of the function you pass to mapPartitions:
(Iterator[T]) ⇒ Iterator[U]
So what is an Iterator? If you take a look at the Iterator documentation you'll see it is a trait which extends TraversableOnce:
trait Iterator[+A] extends TraversableOnce[A]
Above should give you a hint what happens in your case. Iterators provide two methods hasNext and next. To get the size of the Iterator you have to simply iterate over it. After that hasNext returns false and you get an empty Iterator as the result.

Related

it is give none but ı think the algoritm is true where is the mistake could you tell me?

def listUnder(inList,bound):
a=[]
a.append(inList)
b=[]
b.append(bound)
for i in a:
if i<b:
return i
print(listUnder([34,10,9,5,44,1],10))
Write a function listUnder(inList, bound) which takes two input argumens inlist(list of integers) and bound (int) and returns a list that consists of all elements in inlist that are strictly smaller than bound in the same order they appear in inlist.
output:
print(listUnder([34, 10, 9, 5, 44, 1],10))
[9, 5, 1]
I don't know why you are doing this in a complicated manner, a simple list comp would do the job here:
def listUnder(inList, bound):
return [i for i in inList if i < bound]
Aside from that, appending your input list to a list inside your function results in a nested list, which is probably the source of your error. Also, you return a single list element in your for-loop, how can you expect getting a list?

In the case of a tie, how do I return the largest and most frequent number in python?

I have a list of numbers. I created this frequency dictionary d:
from collections import Counter
mylist = [10, 8, 12, 7, 8, 8, 6, 4, 10, 12, 10, 12]
d = Counter(mylist)
print(d)
The output is like this:
Counter({10: 3, 8: 3, 12: 3, 7: 1, 6: 1, 4: 1})
I know I can use max(d, key=d.get) to get value if there is no tie in frequency. If multiple items are maximal, the function usually returns the first one encountered. How can I return the largest number, in this case, 12, instead of 10? Thank you for your help!
Define a lambda function that returns a tuple. Tuples are sorted by their first value, and then tie-broken by subsequent values. Like this:
max(d, key=lambda x:(d.get(x), x))
So for the two example values, the lambda will return (3, 10) and (3, 12). And of course, the second will be considered the max.
Further explanation:
When the max function is given a collection to find the max of, and a key, it will go over the values in the collection, passing each value into the key function. Whatever element from the collection results in the maximal output from the key function is considered the maximal value.
In this case, we're giving it a lambda function. Lambdas are just functions. Literally no difference in their usage, just a different syntax for defining them. The above example could have been written as:
def maxKey(x):return (d.get(x), x)
max(d, key=maxKey)
and it would behave the same way.
Using that function, we can see the return values that it would give for your sample data.
maxKey(10) #(3, 10)
maxKey(12) #(3, 12)
The main difference between the anonymous lambda above and using d.get is that the lambda returns a tuple with two values in it.
When max encounters a tie, it returns the first one it saw. But because we're now returning tuples, and because we know that the second value in each tuple is unique (because it comes from a dictionary), we can be sure that there won't be any duplicates. When max encounters a tuple it first compares the first value in the tuple against whatever it has already found to be the maximal value. If there's a tie there, it compares the next value. If there's a tie there, the next value, etc. So when max compares (3, 10) with (3, 12) it will see that (3, 12) is the maximal value. Since that is the value that resulted from 12 going into the key function, max will see 12 as the maximal value.
You can get the max count (using d.most_common), and then get the max of all keys that have the max count:
max_cnt = d.most_common(1)[0][1]
grt_max = max(n for n, cnt in d.items() if cnt == max_cnt)
print(grt_max)
Output:
12

Groovy: String to integer array

Im coding in Groovy and I have a string parameter "X" which looks like this:
899-921-876-123
For now i succesfully removed the "-" from it by
replaceAll("-", "")
And now I want to divide this String into separete numbers - to an array, like (8,9,9...) to make some calculations using those numbers. But somehow I cannot split() this String and make it an Integer at the same time like that:
assert X.split("")
def XInt = Integer.parseInt(X)
So then when Im trying something like:
def sum = (6* X[0]+ 5 * X[1] + 7 * X[2])
I get an error that "Cannot find matching method int#getAt(int). Please check if the declared type is right and if the method exists." or "Cannot find matching method int#multiply(java.lang.String). Please check if the declared type is right and if the method " if im not converting it to Integer...
Any idea how can I just do calculations on separate numbers of this string?
def X = '899-921-876-123'
def XInt = X.replaceAll(/\D++/, '').collect { it as int }
assert XInt == [8, 9, 9, 9, 2, 1, 8, 7, 6, 1, 2, 3]
assert 6* XInt[0]+ 5 * XInt[1] + 7 * XInt[2] == 6* 8+ 5 * 9 + 7 * 9
the replaceAll removes all non-digits
the collect iterates over the iterable and converts all elements to ints
a String is an iterable of its characters
Given you already just have a string of numbers:
"123"*.toLong() // or toShort(), toInteger(), ...
// ===> [1, 2, 3]
If found #cfrick approach the most grooviest solution.
This makes it complete:
def n = "899-921-876-123".replaceAll("-", "")
print n*.toInteger()

How to retrieve the array elements in a tuple in numpy

when I used numpy.nonzero(), e.g. numpy.nonzero(bool_row), where bool_row is a series containing boolean values. It returns a tuple, which contains only one array. I want to retrieve the elements in the array and put them in a list. How to do that?
When indexing, a tuple is the same as actual values, e.g.
x[1,2]
x[(1,2)]
idx = (1,2); x[idx]
So in you case, the result of nonzero can be used directly as the indexing tuple.
In [566]: x=np.arange(10,20)
In [567]: idx = np.nonzero(x%2)
In [568]: idx
Out[568]: (array([1, 3, 5, 7, 9], dtype=int32),)
In [569]: x[idx]
Out[569]: array([11, 13, 15, 17, 19])
From the nonzero docs
The corresponding non-zero
values can be obtained with::
a[nonzero(a)]
If you need a list instead of an array, you'll have add the .tolist() method.

Python change first with last element

I want to change the first element of a string with the last element.
def change(string):
for i in range(16):
helper = string[i]
string[i]=string[15-i]
string[15-i]=helper
return string
print (change("abcdefghijklmnop"))
Error Output:
string[i]=helper2[0]
TypeError: 'str' object does not support item assignment
You can't alter a string; they're immutable. You can create a new string that is altered as you want:
def change(string):
return string[-1]+string[1:-1]+string[0]
You can use "*" operator.
my_list = [1,2,3,4,5,6,7,8,9]
a, *middle, b = my_list
my_new_list = [b, *middle, a]
my_list
[1, 2, 3, 4, 5, 6, 7, 8, 9]
my_new_list
[9, 2, 3, 4, 5, 6, 7, 8, 1]
Read here for more information.
As you discovered, strings are immutable so you can index a string (eg string[x]), but you can't assign to an index (eg string[x] = 'z').
If you want to swap the first and last element, you will need to create a new string. For example:
def change(input_str):
return input_str[-1] + input_str[1:-1] + input_str[0]
However, based on your example code, it looks like you trying to swap all the "elements" of the string. If you want to do that, see this previous discussion on different methods of reversing a string: Reverse a string in Python
Additionally, even if you could "index" a string, your code would not work as written. With a minor change to "explode" it into a list:
def change(string):
string = [c for c in string]
for i in range(16):
helper = string[i]
string[i]=string[15-i]
string[15-i]=helper
return string
print (change("abcdefghijklmnop"))
DEMO
As you can see, the output is the "same" as the input (except exploded into a list) because you step through every index in the string, reverse all of them twice (which puts them back in their original position).

Resources