NodeJs Underscore sortby - node.js

i need to sort an array by its x value:
before sort
arr=>
[
{x:0, y: 1234},
{x:3, y: 324},
{x:1, y: 3487},
]
after sort:
arr =>
[
{x:0, y: 1234},
{x:1, y: 3487},
{x:3, y: 324},
]
and I m using NodeJs and CoffeeScript, I tried to use the Underscore's sortBy, but it won't work:
_.sortBy(arr, (x) -> arr.x)

Try:
_.sortBy(arr, (item) -> item.x)
The second arg to sortBy is just a function that maps an item on the collection to the value you want to sort on. In this case the item on the collection is one of the objects in your array e.g. {x:0, y: 1234}. So you just need to pick the x property as the value to sort by.

From the fine manual:
sortBy _.sortBy(list, iterator, [context])
Returns a (stably) sorted copy of list, ranked in ascending order by the results of running each value through iterator. Iterator may also be the string name of the property to sort by (eg. length).
Note the last sentence. That means that there is a shortcut for the common case you're facing:
sorted = _(arr).sortBy('x')

Related

I want to arrange the list of strings with a certain condition

I want to arrange the list of strings alphabetically but with the condition that strings that start with x go first. For example, the input is list=['apple','pear','xanadu','stop'].
I'm sure you need to add some condition at the sort function but I'm not sure what to put.
list2=[]
string=input("Enter a string:")
list2.append(string)
while string!="stop":
string=input("Enter a string:")
list2.append(string)
list2.remove("stop")
print("Your list is:",list2)
print("Sorted list:",sorted(list2))
I want the output to be list=['xanadu','apple','pear']. I removed the 'stop' btw.
Use the key function that will determine the ordering of elements:
>>> sorted(['apple','pear','xanadu','stop'], key=lambda val: (0, val) if val.startswith('x') else (1, val))
['xanadu', 'apple', 'pear', 'stop']
The lambda means the following:
lambda val:\ # determine the ordering of the element `val`
(0, val)\ # make the algorithm compare tuples!
if val.startswith('x')\
else (1, val) # use default alphabetical ordering otherwise
Since we're now comparing tuples (but ordering the actual values), tuples whose first element is zero will always sort as being greater than those whose first element is 1.

How to use the max array

maxPrice = 0
for item in cont["price_usd"]:
if(item[1] > maxPrice):
maxPrice = (item[1])
print (maxPrice)
I'm trying to find the max price in an array, and I'm trying to use the max() method to make my code simpler. cont["price_usd"] is a list of [amount_coins, price] and I'm trying to compare all of the prices.
I tried doing this:
list = cont["price_usd"]:
max(list)
but I don't know how to express that I only want the second subitem in each item.
Use map() and max()
prices = list(map(lambda x: x[1], cont["price_usd"]))
maxPrice = max(prices)
print(maxPrice)
The map function here uses the lambda function lambda x: x[1] to take each element of cont["price_usd"], extract the element at index 1, and then put that into a list. Then we call max to find the largest value in that list.
You should use the key keyword of the max function:
maxPrice = max(cont["price_usd"], key=lambda e: e[1])[1]

Python 3.X: Implement returnGreater() function using a list of integers and a value

The function must return a list consisting of the numbers greater than the second number in the function
It must be able to do the following when functioning:
returnGreater([1,2,3,4,5], 3)
[4,5]
returnGreater([-8,2,-4,1,3,-5],3)
[]
Here's what I have (I've gone through a few iterations), though I get a Type Error for trying to use a ">" symbol between an int and list:
def returnGreater (x,y):
"x:list(int) , return:list(int)"
#greater: int
greater = []
for y in x:
#x: int
if x > y:
x = greater
return greater
You're using the name y for two different things in your code. It's both an argument (the number to compare against) and the loop variable. You should use a different name for one of those.
I'd strongly suggest picking meaningful names, as that will make it much clearer what each variable means, as well as making it much less likely you'll use the same name for two different things. For instance, here's how I'd name the variables (getting rid of both x and y):
def returnGreater(list_of_numbers, threshold):
greater = []
for item in list_of_numbers:
if item > threshold:
greater.append(item)
return greater
You had another issue with the line x = greater, which didn't do anything useful (it replaced the reference to the original list with a reference to the empty greater list. You should be appending the item you just compared to the greater list instead.
I recommend filter. Easy and Graceful way.
def returnGreater(x, y):
return list(filter(lambda a:a>y, x))
It means, filter each element a in list x using lambda whether a is greater than y or not.
List Comprehensions
def returnGreater(_list, value):
return [x for x in _list if x > value]

How to find sum of all descendents attribute named x in ArangoDB?

As i have tree structure in my graph database ArangoDB like below
The node with id 2699394 is a parent node of this graph tree. and each node has attribute named X assigned to it. I want to know sum of x of all the descendants of parent node 2699394 exclusive of its own attribute x in sum.
for example suppose if,
2699399 x value is = 5,
2699408 x value is = 3,
2699428 x value is = 2,
2699418 x value is = 5,
then parent node 2699394 sum of x should be = 5 + 3 + 2 + 5
= 15
so the answer is 15. So can anybody give me query for this calculation in ArangoDB AQL?
To find out no of descendants of particular node, i have used below query,
`FOR v, e, p in 1..1000 OUTBOUND 'Person/1648954'
GRAPH 'Appes'
RETURN v.id`
Thanks in Advance.
Mayank
Assuming that children are linked to their parents, the data could be visualized like this:
nodes/2699394 SUM of children?
↑
nodes/2699399 {x: 5}
↑
nodes/2699408 {x: 3}
↑
nodes/2699428 {x: 2}
↑
nodes/2699418 {x: 5}
To walk the chain of children, we need to traverse in INBOUND direction (or OUTOBUND if parent nodes point to children):
FOR v IN 1..10 INBOUND "nodes/2699394" relations
RETURN v
In this example, an anonymous graph is used by specifying an edge collection relations. You can also use a named graph, like GRAPH "yourGraph".
Starting at nodes/2699394, the edges down to nodes/2699418 are traversed and every node on the way is returned unchanged so far.
Since we are only interested in the x attribute, we can change that to only return that attribute: RETURN v.x - which will return [ 5, 3, 2, 5 ]. Unless we say IN 0..10, the start vertex will not be included.
Inside the FOR loop, we don't have access to all the x values, but only one at a time. We can't do something like RETURN SUM(v.x) here. Instead, we need to assign the result of the traversal to a variable, which makes it a sub-query. We can then add up all the numbers and return the resulting value:
LET x = (
FOR v IN 1..10 INBOUND "nodes/2699394" relations
RETURN v.x
)
RETURN SUM(x) // [ 15 ]
If you want to return the start node with a computed x attribute, you may do the following:
LET doc = DOCUMENT("nodes/2699394")
LET x = (
FOR v IN 1..10 INBOUND doc relations
RETURN v.x
)
RETURN MERGE( doc, { x: SUM(x) } )
The result will be like:
[
{
"_id": "nodes/2699394",
"_key": "2699394",
"_rev": "2699394",
"x": 15
}
]

Find distinct values for each column in an RDD in PySpark

I have an RDD that is both very long (a few billion rows) and decently wide (a few hundred columns). I want to create sets of the unique values in each column (these sets don't need to be parallelized, as they will contain no more than 500 unique values per column).
Here is what I have so far:
data = sc.parallelize([["a", "one", "x"], ["b", "one", "y"], ["a", "two", "x"], ["c", "two", "x"]])
num_columns = len(data.first())
empty_sets = [set() for index in xrange(num_columns)]
d2 = data.aggregate((empty_sets), (lambda a, b: a.add(b)), (lambda x, y: x.union(y)))
What I am doing here is trying to initate a list of empty sets, one for each column in my RDD. For the first part of the aggregation, I want to iterate row by row through data, adding the value in column n to the nth set in my list of sets. If the value already exists, it doesn't do anything. Then, it performs the union of the sets afterwards so only distinct values are returned across all partitions.
When I try to run this code, I get the following error:
AttributeError: 'list' object has no attribute 'add'
I believe the issue is that I am not accurately making it clear that I am iterating through the list of sets (empty_sets) and that I am iterating through the columns of each row in data. I believe in (lambda a, b: a.add(b)) that a is empty_sets and b is data.first() (the entire row, not a single value). This obviously doesn't work, and isn't my intended aggregation.
How can I iterate through my list of sets, and through each row of my dataframe, to add each value to its corresponding set object?
The desired output would look like:
[set(['a', 'b', 'c']), set(['one', 'two']), set(['x', 'y'])]
P.S I've looked at this example here, which is extremely similar to my use case (it's where I got the idea to use aggregate in the first place). However, I find the code very difficult to convert into PySpark, and I'm very unclear what the case and zip code is doing.
There are two problems. One, your combiner functions assume each row is a single set, but you're operating on a list of sets. Two, add doesn't return anything (try a = set(); b = a.add('1'); print b), so your first combiner function returns a list of Nones. To fix this, make your first combiner function non-anonymous and have both of them loop over the lists of sets:
def set_plus_row(sets, row):
for i in range(len(sets)):
sets[i].add(row[i])
return sets
unique_values_per_column = data.aggregate(
empty_sets,
set_plus_row, # can't be lambda b/c add doesn't return anything
lambda x, y: [a.union(b) for a, b in zip(x, y)]
)
I'm not sure what zip does in Scala, but in Python, it takes two lists and puts each corresponding element together into tuples (try x = [1, 2, 3]; y = ['a', 'b', 'c']; print zip(x, y);) so you can loop over two lists simultaneously.

Resources