Combining 2 lists with streams in Groovy - groovy

Say I have two lists of equal size [1, 2, 3, 4, ...] and [a, b, c, d, ...]. Is there a way I make a map with streams that maps 1 to a, 2 to b, 3 to c, and so on without using lambda functions or nested functions?
I would use map and pass in a function, but this passed-in function can only take 1 argument and I need both pieces of information to map the elements to each other.
IntStream(1, list1.size()).stream().map(this.&combineListsFunction).collect...
combineListsFunction can only use the information from the stream, but I need both lists for the function to work.

You can transpose both lists (which will give you a list of tuples and then create the map from it with collectEntries() (which takes exactly this). E.g.:
def l1 = [1,2,3]
def l2 = ["a","b","c"]
assert [(1): "a", (2): "b", (3): "c"] == [l1,l2].transpose().collectEntries()

Related

How to subtract adjacent items in list with unknown length (python)?

Provided with a list of lists. Here's an example myList =[[70,83,90],[19,25,30]], return a list of lists which contains the difference between the elements. An example of the result would be[[13,7],[6,5]]. The absolute value of (70-83), (83-90), (19-25), and (25-30) is what is returned. I'm not sure how to iterate through the list to subtract adjacent elements without already knowing the length of the list. So far I have just separated the list of lists into two separate lists.
list_one = myList[0]
list_two = myList[1]
Please let me know what you would recommend, thank you!
A custom generator can return two adjacent items at a time from a sequence without knowing the length:
def two(sequence):
i = iter(sequence)
a = next(i)
for b in i:
yield a,b
a = b
original = [[70,83,90],[19,25,30]]
result = [[abs(a-b) for a,b in two(sequence)]
for sequence in original]
print(result)
[[13, 7], [6, 5]]
Well, for each list, you can simply get its number of elements like this:
res = []
for my_list in list_of_lists:
res.append([])
for i in range(len(my_list) - 1):
# Do some stuff
You can then add the results you want to res[-1].

Sorting a list of strings items from an array [duplicate]

This question already has answers here:
Is there a built in function for string natural sort?
(23 answers)
Closed 3 years ago.
I'm new to python automation and wrote a script to get some port handles from Ixia and store into a list. I;m trynig to sort that port-handle where I see a problem.
I tried using the sort method but doesn;t work
>>> a
['1/1/11', '1/1/6']
>>> a.sort()
>>> a
['1/1/11', '1/1/6']
>>> d = a.sort()
>>> print(d)
None
>>>
Am i missing anything here .. kindly clarify
I want the output in the following format
1/1/6 1/1/11
Explanation
You are trying to sort a list of strings. Strings are naturally sorted in lexicographical_order, i.e. "10" < "11" < "2" < "5" < ..., so Python executes correctly what you want it to do. This being said, you need to transform your data into something that will be sorted as you want.
Solution
>>> a = ['1/1/11', '1/1/6']
>>> a
['1/1/11', '1/1/6']
>>> def to_tuple(string_representation):
... return tuple(int(i) for i in string_representation.split('/'))
...
>>> b = [to_tuple(element) for element in a]
>>> b.sort()
>>> b
[(1, 1, 6), (1, 1, 11)]
>>> a.sort(key=to_tuple)
>>> a
['1/1/6', '1/1/11']
Here we use the fact that tuple is sorted by default exactly how we want it to be sorted in your case (actually, it is also a lexicographical order, but now 11 is one element of a sequence and not two).
List b contains a transformed list, where each element is a tuple. And now sort will work as you want.
The second option, will be using a custom key operator. It is a function that returns a key to compare different elements of your list. In this case, key will be a corresponding tuple of integers and will be compared as you want.
Note 1
The second approach (with the key operator) will create an additional overhead during sorting as it will be called O(NlogN) times.
Note 2
You also tried using the result of sort function as a value, but it changes the given list in-place. If you want a sorted copy of your list, use sorted.

Merge items from separate lists into nested lists

Hello I am trying to merge two lists sequentially into sub lists. I wonder if this is possible without list comprehensions or a lambda operation as I'm still learning how to work with those approaches. Thank you
a = [0,1,2,3]
b = [4,5,6,7]
#desired output
c = [0,4],[1,5],[2,6],[3,7]
An approach that doesn't involve lambdas or list comprehensions (not sure what the issue is with list-comps) would be with map:
c = list(map(list, zip(a, b)))
This first zips the lists together, then creates a list instance for every tuple generated from zip with map and wraps it all up in list in order for map to yield all it's contents:
print(c)
[[0, 4], [1, 5], [2, 6], [3, 7]]
This, at least in my view, is less understandable than the equivalent comprehension John supplied in a comment.
Here’s a solution suitable for beginners!
c = []
a = [0,1,2,3]
b = [4,5,6,7]
for i in range(min(len(a), len(b))):
c.append([a[i], b[i]]) # writing [a[i], b[i]] creates a new list
print(c)

Can you call 2 args from a function into another function? [duplicate]

So, Python functions can return multiple values. It struck me that it would be convenient (though a bit less readable) if the following were possible.
a = [[1,2],[3,4]]
def cord():
return 1, 1
def printa(y,x):
print a[y][x]
printa(cord())
...but it's not. I'm aware that you can do the same thing by dumping both return values into temporary variables, but it doesn't seem as elegant. I could also rewrite the last line as "printa(cord()[0], cord()[1])", but that would execute cord() twice.
Is there an elegant, efficient way to do this? Or should I just see that quote about premature optimization and forget about this?
printa(*cord())
The * here is an argument expansion operator... well I forget what it's technically called, but in this context it takes a list or tuple and expands it out so the function sees each list/tuple element as a separate argument.
It's basically the reverse of the * you might use to capture all non-keyword arguments in a function definition:
def fn(*args):
# args is now a tuple of the non-keyworded arguments
print args
fn(1, 2, 3, 4, 5)
prints (1, 2, 3, 4, 5)
fn(*[1, 2, 3, 4, 5])
does the same.
Try this:
>>> def cord():
... return (1, 1)
...
>>> def printa(y, x):
... print a[y][x]
...
>>> a=[[1,2],[3,4]]
>>> printa(*cord())
4
The star basically says "use the elements of this collection as positional arguments." You can do the same with a dict for keyword arguments using two stars:
>>> a = {'a' : 2, 'b' : 3}
>>> def foo(a, b):
... print a, b
...
>>> foo(**a)
2 3
Actually, Python doesn't really return multiple values, it returns one value which can be multiple values packed into a tuple. Which means that you need to "unpack" the returned value in order to have multiples.
A statement like
x,y = cord()
does that, but directly using the return value as you did in
printa(cord())
doesn't, that's why you need to use the asterisk. Perhaps a nice term for it might be "implicit tuple unpacking" or "tuple unpacking without assignment".

Find distinct values for each column in an RDD in PySpark

I have an RDD that is both very long (a few billion rows) and decently wide (a few hundred columns). I want to create sets of the unique values in each column (these sets don't need to be parallelized, as they will contain no more than 500 unique values per column).
Here is what I have so far:
data = sc.parallelize([["a", "one", "x"], ["b", "one", "y"], ["a", "two", "x"], ["c", "two", "x"]])
num_columns = len(data.first())
empty_sets = [set() for index in xrange(num_columns)]
d2 = data.aggregate((empty_sets), (lambda a, b: a.add(b)), (lambda x, y: x.union(y)))
What I am doing here is trying to initate a list of empty sets, one for each column in my RDD. For the first part of the aggregation, I want to iterate row by row through data, adding the value in column n to the nth set in my list of sets. If the value already exists, it doesn't do anything. Then, it performs the union of the sets afterwards so only distinct values are returned across all partitions.
When I try to run this code, I get the following error:
AttributeError: 'list' object has no attribute 'add'
I believe the issue is that I am not accurately making it clear that I am iterating through the list of sets (empty_sets) and that I am iterating through the columns of each row in data. I believe in (lambda a, b: a.add(b)) that a is empty_sets and b is data.first() (the entire row, not a single value). This obviously doesn't work, and isn't my intended aggregation.
How can I iterate through my list of sets, and through each row of my dataframe, to add each value to its corresponding set object?
The desired output would look like:
[set(['a', 'b', 'c']), set(['one', 'two']), set(['x', 'y'])]
P.S I've looked at this example here, which is extremely similar to my use case (it's where I got the idea to use aggregate in the first place). However, I find the code very difficult to convert into PySpark, and I'm very unclear what the case and zip code is doing.
There are two problems. One, your combiner functions assume each row is a single set, but you're operating on a list of sets. Two, add doesn't return anything (try a = set(); b = a.add('1'); print b), so your first combiner function returns a list of Nones. To fix this, make your first combiner function non-anonymous and have both of them loop over the lists of sets:
def set_plus_row(sets, row):
for i in range(len(sets)):
sets[i].add(row[i])
return sets
unique_values_per_column = data.aggregate(
empty_sets,
set_plus_row, # can't be lambda b/c add doesn't return anything
lambda x, y: [a.union(b) for a, b in zip(x, y)]
)
I'm not sure what zip does in Scala, but in Python, it takes two lists and puts each corresponding element together into tuples (try x = [1, 2, 3]; y = ['a', 'b', 'c']; print zip(x, y);) so you can loop over two lists simultaneously.

Resources