Need help understanding code execution, for nested for loop - python-3.x

I have trouble understanding the element wise execution of the following code. The goal is to define a function, that returns the cartesian product of 2 sets. The problem should be solved using the methods in the code below.
I have tried looking up similar problems, but since i am new to programing and python i get stuck easy.
A = {1,2,3,4}
B = {3,4,5}
def setprod(m1,m2):
p=set()
for e1 in m1:
for e2 in m2:
p.add((e1,e2))
return p
setprod(A,B) returns {(1, 3), (3, 3), (4, 5), (4, 4), (1, 4), (1, 5), (2, 3), (4, 3), (2, 5), (3, 4), (2, 4), (3, 5)} The cartesian product is the set containing all the ordered pairs of elements of the two sets. The elements in A can be choosen 4 diffrent ways and B 3 giving 4x3=12 combinations.
I just can`t see why the code above accomplishes this.

If you have access to a debugging tool (perhaps you could install pycharm and use its debugger) you can see what's going on.
Let's step through what's going on in the code together mentally.
A = {1,2,3,4} #Step 1, load a set (1,2,3,4)
B = {3,4,5} #Step 2, load a set (3,4,5)
def setprod(m1,m2): #Step 4, define the function
p=set()
for e1 in m1:
for e2 in m2:
p.add((e1,e2))
return p
setprod(A,B) #Step 5, execute function with parameters
At this point if we want to see what setprod does we step into the function.
p=set() #Steppedin, step 1 create empty set
for e1 in m1: #Steppedin, step 2, begin forloop iterating through m1,
#which contains (1,2,3,4); e1 is set to 1
for e2 in m2: #Steppedin, step 3 begin inner for loop
#iterating through m2, which contains (3,4,5),
#e2 is set to 3, e1 contains the value 1
p.add((e1,e2)) #Stepped in, step 4. add (m1[0],m2[0]), represented by
# (e1,e2) to the set.
return p
At stepped in step 4, the next step is the same line of code but with different register values, e2 is no longer m2[0] but m2[1]
p.add((e1,e2)) #Stepped in, step 5. add (m1[0],m2[1]), represented by
# (e1,e2) to the set where e1 = 1 and e2 = 4
.
p.add((e1,e2)) #Stepped in, step 6. add (m1[0],m2[2]), represented by
# (e1,e2) to the set where e1 = 1 and e2 = 5
At this point we return to the parent for loop.
for e1 in m1: #Stepped in, step 7.
#use m1[1] as e1 and repeat previous process but
#with the new e1 value set to 2
for e2 in m2: #Stepped in, step 8. e1 contains 2, e2 is set to 3
p.add((e1,e2))
(Just a note, if you were debugging this, I believe you'll only see the values for e2 and e1 when you are at the section of the code p.add, saying that e1 is set to some value at #stepped in, step 7, isn't completely accurate but is helpful enough for looking at the idea of what is happening.)

Related

Python3 itertools permutation & combination question

I want to build a combination of all 7-digit numbers possible from a set of numbers following the rules below.
Each digit can only hold values as shown below from python3 list variables N1-N7 as an example.
N1 = [0,1,2]
N2 = [0,1]
N3 = [0,1,2]
N4 = [0]
N5 = [1]
N6 = [0,1]
N7 = [0]
Total of all the digits in any of the 7-digit numbers should be exactly 5. Valid examples of the 7-digit numbers.
0120110,1110110,1120100,2020100
You can use the Cartesian product method of the itertools library,
Basically it generates all possible combinations of each group.
For example,
import itertools
list(itertools.product([1,2], ['a', 'b']))
Should return:
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
Applying it to your constraint that we need to pick the first digit from N1, second digit from N2 etc...
result = [''.join(map(str, combination)) for combination in itertools.product(N1, N2, N3, N4, N5, N6, N7)]
This does not enforce that the sum will be equal to five, so we can add it with a simple if statement:
result = [''.join(map(str, cartesian_product)) for cartesian_product in itertools.product(N1, N2, N3, N4, N5, N6, N7) if sum(cartesian_product) == 5]
Resulting in:
['0120110', '1020110', '1110110', '1120100', '2010110', '2020100', '2100110', '2110100']

Finding x numbers in list greater than 0 with potential duplicates and assigning index of original list

I have lists of floats which will have some zeros in it. Eg.
numbers = [1.2, 0.0, 0.0, 1.2, 2.0, 2.5, 17, 1.3, 1.8, 1.3, 1.2]
I am trying to assign these values to n numbers (assuming it will be the first 5) for the lowest 5 values that are greater than 0.
I can get the first by using:
first = min(o for o in numbers if o > 0)
But as there are duplicates in the smallest value (1.2), I cannot easily assign second, third, fourth and fifth.
I need to assign these and allow me to keep the index of their values in the original list and assign these too. Eg.
first_pos = numbers.index(first)
I cannot use the above for second as it will assign it the first index value.
Is there any efficient way using a for loop or list comprehension or even a small function to assigning the other numbers so that:
second = 1.2
second_pos = 4
third = 1.2
third_pos = 10
fourth = 1.3
fourth_pos = 7
fifth = 1.3
fifth_pos = 9
I cannot do this with any list comprehension I know of for second as it will not pick up a duplicate. Eg.:
sec = min(o for o in numbers if o > first)
The lists vary in length of values (at least 5, though) and may or may not have duplicates and zeros but many will.
IIUC, one way using sorted with enumerate:
sorted(((n, i) for n, i in enumerate(numbers) if i > 0), key=lambda x: x[1])[:5]
Output of (index, value) pairs of first 5 smallest values:
[(0, 1.2), (3, 1.2), (10, 1.2), (7, 1.3), (9, 1.3)]
Ok, to badly answer my own question, I have been able to do this by copying and removing the zeros, enumerating over the list for the index values and removing each number once assigned:
number = [n for n in numbers if n > 0]
numbs = [n for n, x in enumerate(numbers) if x > 0]
for n in number:
first = min(number)
first_pos = number.index(first)
first_ind = numbs[first_pos]
number.remove(first)
numbs.remove(first_ind)
for n in number:
second = min(number)
sec_pos = number.index(second)
sec_ind = numbs[sec_pos]
number.remove(second)
numbs.remove(sec_ind)
This will keep and assign the values and indexes of each minimum value greater than zero.
Is there any way to add this into a function to assign all values greater than zero in the list to its own variables?

How can I fix statsmodels.AnovaRM claiming "Data is unbalanced" although it isn't?

I am trying to perform a three-way repeated measurements ANOVA with statsmodels.AnovaRM, but there is already a hindrance while performing a two-way ANOVA: When running
aov = AnovaRM(anova_df, depvar='Test', subject='Subject',
within=["Factor1", "Factor2"], aggregate_func='mean').fit()
print(aov)
it returns "Data is unbalanced.". Let's look at the factors I extracted from the DataFrame that I fed into it:
Factor1, level 0, shape: (68, 6)
Factor1, level 1, shape: (68, 6)
Factor1, level 2, shape: (68, 6)
Factor2, level a, shape: (68, 6)
Factor2, level b, shape: (68, 6)
Factor2, level c, shape: (68, 6)
Because this is a test, I even aligned the Factors with each other.
Test Factor1 Factor 2
0 32.6 0 a
1 39.3 1 b
2 43.0 2 c
3 32.0 0 a
4 32.8 1 b
5 38.3 2 c
6 36.7 0 a
7 40.4 1 b
8 41.9 2 c
How is that not being balanced? What am I doing wrong, how can I fix this?
I run into the same issue. A dataset that AnovaRM runs with and works is in this tutorial: https://pythontutorials.eu/numerical/statistics-solution-1/
I also used your method of checking the shapes iterating through all the levels of all the variables. The output also showed everything has the same shape. The dataset in the link above also has this feature.
It turned out that having the same shape is not enough. For the variable you use for subject, in your input df, if you run something like df[subject_name].value_counts() every unique subject_name has to have the same number. If the numbers are different, the AnovaRM will give you an unbalanced data error.
I used this checking method on my df and it showed that some subjects have fewer values than others, while when checking on the example df from the link above, every subject has the same number of values. Furthermore, I manually subset my df to include the subjects that have the same number of values/measurements, and AnovaRM worked for me. Have a try and let me know whether this helps you understand what unbalancing really means.
factor1 = factor2 in every block.
Try using an index like "Treatment" and drop factors 1 and 2:
treatment When
X F1 = 0 and F2 = a
Y F1 = 1 and F2 = b
Z F1 = 2 and F2 = c

What does the & operator do in PyTorch and why does it change the shape?

I have code that contains x and y, both of the type torch.autograd.variable.Variable. Their shape is
torch.Size([30, 1, 9])
torch.Size([1, 9, 9])
What I don't understand is, why the following results in a different size/shape
z = x & y
print(z.shape)
which outputs
torch.Size([30, 9, 9])
Why is the shape of z 30*9*9, after x & y? The shape of x is 30*1*9, and the shape of y is 1*9*9, what does & do in x & y?
This has nothing to do with the & operator, but with how broadcasting works in Python. To quote Eric Wieser's excellent documentation on broadcasting in NumPy:
In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.
See the following image from the quoted page as an example:
This translates to your problem as follows:
a has the shape 30 x 1 x 9
b has the shape 1 x 9 x 9
Therefore the result is created like this:
result1 is a1, because a1 > b1
result2 is b2, because a2 < b2
result3 is both a3 and b3, because a3 = b3
Therefore result has the shape 30 x 9 x 9.
Please also note that the & operator implements logical conjunction on a binary encoding of the tensors' items.

How to return the file number from bag of words

I am working with CountVectorizer from the sklearn, I want to know how I will access or extract the file number, these what I try
like from the out put: (1 ,12 ) 1
I want only the 1 which represent the file number
from sklearn.feature_extraction.text import CountVectorizer
vectorizer=CountVectorizer()
string1="these is my first statment in vectorizer"
string2="hello every one i like the place here"
string3="i am going to school every day day like the student in my school"
email_list=[string1,string2,string3]
bagofword=vectorizer.fit(email_list)
bagofword=vectorizer.transform(email_list)
print(bagofword)
output:
(0, 3) 1
(0, 7) 1
(0, 8) 1
(0, 10) 1
(0, 14) 1
(1, 12) 1
(1, 16) 1
(2, 0) 1
(2, 1) 2
You could iterate over the columns of the sparse array with,
features_map = [col.indices.tolist() for col in bagofword.T]
and to get a list of all documents that contain the feature k, simply take the element k of this list.
For instance, features_map[2] == [1, 2] means that feature number 2, is present in documents 1 and 2.

Resources