How to loop through a list of tuples using a for loop? - python-3.x

I have a list that contains lists of tuples. Here is a sample of the data.
[ 144.91, 145.03, 145.1] [ 12964.0, 12818.0, 13441.0] [123.23, 152.45, 132.75] [12523.51, 12425.32, 12225.1] [122.22, 123.42, 120.21] [12444.43, 12232.22, 12111.12]
The list is structured as x and y pairs. For example, [ 144.91, 145.03, 145.1] is x1 and [ 12964.0, 12818.0, 13441.0] is y1. Then the sequence repeats, [123.23, 152.45, 132.75] is x2 and [12523.51, 12425.32, 12225.1] is y2.....
What i am trying to do is pass the x and y pairs into a for loop that inserts each x and y individual tuples into its own column within a common row inside SQL. However this is where my problem occurs.
if i use the below loop code i can insert the entire x tuple and y tuple into a single row but it will repeat the same x tuple and y tuple in ever row without iterating through.
for i in range(len(hex_strings_list)):
insert_query(x_tuple_values,y_tuple_values)
if i attempt to iterate through each list of tuples with this code the for loop will insert each number within each tuple within its own row.
for i in range(len(hex_strings_list)):
print(x_tuple_values[i],y_tuple_values[i])
i know that i want to iterate through the tuples rather than the individual numbers that make up the tuple, but i'm at a loss how to accomplish this. total mental block!

Related

How to find match between two 2D lists in Python?

Lets say I have two 2D lists like this:
list1 = [ ['A', 5], ['X', 7], ['P', 3]]
list2 = [ ['B', 9], ['C', 5], ['A', 3]]
I want to compare these two lists and find where the 2nd item matches between the two lists e.g here we can see that numbers 5 and 3 appear in both lists. The first item is actually not relevant in comparison.
How do I compare the lists and copy those values that appear in 2nd column of both lists? Using 'x in list' does not work since these are 2D lists. Do I create another copy of the lists with just the 2nd column copied across?
It is possible that this can be done using list comprehension but I am not sure about it so far.
There might be a duplicate for this but I have not found it yet.
The pursuit of one-liners is a futile exercise. They aren't always more efficient than the regular loopy way, and almost always less readable when you're writing anything more complicated than one or two nested loops. So let's get a multi-line solution first. Once we have a working solution, we can try to convert it to a one-liner.
Now the solution you shared in the comments works, but it doesn't handle duplicate elements and also is O(n^2) because it contains a nested loop. https://wiki.python.org/moin/TimeComplexity
list_common = [x[1] for x in list1 for y in list2 if x[1] == y[1]]
A few key things to remember:
A single loop O(n) is better than a nested loop O(n^2).
Membership lookup in a set O(1) is much quicker than lookup in a list O(n).
Sets also get rid of duplicates for you.
Python includes set operations like union, intersection, etc.
Let's code something using these points:
# Create a set containing all numbers from list1
set1 = set(x[1] for x in list1)
# Create a set containing all numbers from list2
set2 = set(x[1] for x in list2)
# Intersection contains numbers in both sets
intersection = set1.intersection(set2)
# If you want, convert this to a list
list_common = list(intersection)
Now, to convert this to a one-liner:
list_common = list(set(x[1] for x in list1).intersection(x[1] for x in list2))
We don't need to explicitly convert x[1] for x in list2 to a set because the set.intersection() function takes generator expressions and internally handles the conversion to a set.
This gives you the result in O(n) time, and also gets rid of duplicates in the process.

How to Change position of elements in python list

I have a python list like
the_list= ['john',"nick","edward","mood","enp","wick"]
i always want the mood and enp to be in the 0th and 1st index of list rest order can be anything.
so the output will be
op_list= ["mood","enp",........rest..]
The following will work:
op_list = ["mood", "enp"] + [x for x in the_list if x not in ("mood", "enp")]
This assumes the two special elements are always present.

List comprehension of 3 nested loops and the output is based on if-else condition

Is it possible to convert this into a list comprehension? For example, I have a list v. On the source code below, v = dictionary.keys()
v = ["naive", "bayes", "classifier"]
I have the following nested list t.
t = [["naive", "bayes"], ["lol"]]
The expected output O should be:
O = [[1 1 0], [0 0 0]]
1 if the dictionary contains the word and 0 if not. I'm creating a spam/ham feature matrix. Due to the large dataset, I'd like to convert the code below into a list comprehension for a faster iteration.
ham_feature_matrix = []
for each_file in train_ham:
feature_vector = [0] * len(dictionary)
for each_word in each_file:
for d,dicword in enumerate(dictionary.keys()):
if each_word == dicword:
feature_vector[d] = 1
ham_feature_matrix.append(feature_vector)
I couldn't test this, but this translates as:
ham_feature_matrix = [[[int(each_word == dicword) for dicword in dictionary] for each_word in each_file] for each_file in train_ham]
[int(each_word == dicword) for dicword in dictionary] is the part which changes the most compared to your original code.
Basically, since you're iterating on the words of the dictionary, you don't need enumerate to set the matching slots to 1. The comprehension builds the list with the result of the comparison which is 0 or 1 when converted to integers. You don't need to get the keys since iterating on a dictionary iterates on the keys by default.
The rest of the loops is trivial.
The issue I'm seeing here is that you're iterating on a dictionary to create a list of booleans, but the order of the dictionary isn't fixed, so you'll have different results each time (like in your original code) unless you sort the items somehow.

python3 functional programming: Accumulating items from a different list into an initial value

I have some code that performs the following operation, however I was wondering if there was a more efficient and understandable way to do this. I am thinking that there might be something in itertools or such that might be designed to perform this type of operation.
So I have a list of integers the represents changes in the number of items from one period to the next.
x = [0, 1, 2, 1, 3, 1, 1]
Then I need a function to create a second list that accumulates the total number of items from one period to the next. This is like an accumulate function, but with elements from another list instead of from the same list.
So I can start off with an initial value y = 3.
The first value in the list y = [3]. The I would take the second
element in x and add it to the list, so that means 3+1 = 4. Note that I take the second element because we already know the first element of y. So the updated value of y is [3, 4]. Then the next iteration is 4+2 = 6. And so forth.
The code that I have looks like this:
def func():
x = [0, 1, 2, 1, 3, 1, 1]
y = [3]
for k,v in enumerate(x):
y.append(y[i] + x[i])
return y
Any ideas?
If I understand you correctly, you do what what itertools.accumulate does, but you want to add an initial value too. You can do that pretty easily in a couple ways.
The easiest might be to simply write a list comprehension around the accumulate call, adding the initial value to each output item:
y = [3 + val for val in itertools.accumulate(x)]
Another option would be to prefix the x list with the initial value, then skip it when accumulate includes it as the first value in the output:
acc = itertools.accumulate([3] + x)
next(acc) # discard the extra 3 at the start of the output.
y = list(acc)
Two things I think that need to be fixed:
1st the condition for the for loop. I'm not sure where you are getting the k,v from, maybe you got an example using zip (which allows you to iterate through 2 lists at once), but in any case, you want to iterate through lists x and y using their index, one approach is:
for i in range(len(x)):
2nd, using the first append as an example, since you are adding the 2nd element (index 1) of x to the 1st element (index 0) of y, you want to use a staggered approach with your indices. This will also lead to revising the for loop condition above (I'm trying to go through this step by step) since the first element of x (0) will not be getting used:
for i in range(1, len(x)):
That change will keep you from getting an index out of range error. Next for the staggered add:
for i in range(1, len(x)):
y.append(y[i-1] + x[i])
return y
So going back to the first append example. The for loop starts at index 1 where x = 1, and y has no value. To create a value for y[1] you append the sum of y at index 0 to x at index 1 giving you 4. The loop continues until you've exhausted the values in x, returning accumulated values in list y.

Find distinct values for each column in an RDD in PySpark

I have an RDD that is both very long (a few billion rows) and decently wide (a few hundred columns). I want to create sets of the unique values in each column (these sets don't need to be parallelized, as they will contain no more than 500 unique values per column).
Here is what I have so far:
data = sc.parallelize([["a", "one", "x"], ["b", "one", "y"], ["a", "two", "x"], ["c", "two", "x"]])
num_columns = len(data.first())
empty_sets = [set() for index in xrange(num_columns)]
d2 = data.aggregate((empty_sets), (lambda a, b: a.add(b)), (lambda x, y: x.union(y)))
What I am doing here is trying to initate a list of empty sets, one for each column in my RDD. For the first part of the aggregation, I want to iterate row by row through data, adding the value in column n to the nth set in my list of sets. If the value already exists, it doesn't do anything. Then, it performs the union of the sets afterwards so only distinct values are returned across all partitions.
When I try to run this code, I get the following error:
AttributeError: 'list' object has no attribute 'add'
I believe the issue is that I am not accurately making it clear that I am iterating through the list of sets (empty_sets) and that I am iterating through the columns of each row in data. I believe in (lambda a, b: a.add(b)) that a is empty_sets and b is data.first() (the entire row, not a single value). This obviously doesn't work, and isn't my intended aggregation.
How can I iterate through my list of sets, and through each row of my dataframe, to add each value to its corresponding set object?
The desired output would look like:
[set(['a', 'b', 'c']), set(['one', 'two']), set(['x', 'y'])]
P.S I've looked at this example here, which is extremely similar to my use case (it's where I got the idea to use aggregate in the first place). However, I find the code very difficult to convert into PySpark, and I'm very unclear what the case and zip code is doing.
There are two problems. One, your combiner functions assume each row is a single set, but you're operating on a list of sets. Two, add doesn't return anything (try a = set(); b = a.add('1'); print b), so your first combiner function returns a list of Nones. To fix this, make your first combiner function non-anonymous and have both of them loop over the lists of sets:
def set_plus_row(sets, row):
for i in range(len(sets)):
sets[i].add(row[i])
return sets
unique_values_per_column = data.aggregate(
empty_sets,
set_plus_row, # can't be lambda b/c add doesn't return anything
lambda x, y: [a.union(b) for a, b in zip(x, y)]
)
I'm not sure what zip does in Scala, but in Python, it takes two lists and puts each corresponding element together into tuples (try x = [1, 2, 3]; y = ['a', 'b', 'c']; print zip(x, y);) so you can loop over two lists simultaneously.

Resources