python 3 double loop comprehension clarification - python-3.x

I'm curious about the double for-loop comprehension.
Comprehension:
multilist = [[row*col for col in range(colNum)] for row in range(rowNum)]
Normal double loop:
for row in range(rowNum):
for col in range(colNum):
multilist[row][col] = row*col
Both of the methods yield the same outcome. For instance, I insert 3 as my row and 5 as my col, they would produce
[[0, 0, 0, 0, 0], [0, 1, 2, 3, 4], [0, 2, 4, 6, 8]]
My question is why the col for-loop is placed as the outer loop in the comprehension instead of the row for-loop? I would welcome any explanation.
Thank you.

In a list comprehension, such as yours, the farthest for loop (rowNum) is executed first.
multilist = [[row*col for col in range(colNum)] for row in range(rowNum)]
Therefore, col for-loop is still the inner loop in the comprehension.

Related

Pyspark: How to count the number of each equal distance interval in RDD

I have a RDD[Double], I want to divide the RDD into k equal intervals, then count the number of each equal distance interval in RDD.
For example, the RDD is like [0,1,2,3,4,5,6,6,7,7,10]. I want to divided it into 10 equal intervals, so the intervals are [0,1), [1,2), [2,3), [3,4), [4,5), [5,6), [6,7), [7,8), [8,9), [9,10].
As you can see, each element of RDD will be in one of the intervals. Then I want to calculate the number of each interval. Here, there are one element in [0,1),[1,2),[2,3),[3,4),[4,5),[5,6), and both [6,7) and [7,8) have two element. [9,10] has one element.
Finally I expected an array like array([1,1,1,1,1,1,2,2,0,1].
Try this. I have assumed that first element of the range is inclusive and last exclusive. Please confirm on this. For example when considering the range [0,1] and element is 0 the condition is element >= 0 and element < 1.
for index_upper, element_upper in enumerate(array_range):
counter = 0
for index, element in enumerate(rdd.collect()):
if element >= element_upper[0] and element < element_upper[1] :
counter +=1
countElementsWithinRange.append(counter)
print(rdd.collect())
# [0, 1, 2, 3, 4, 5, 6, 6, 7, 7, 10]
print(countElementsWithinRange)
# [1, 1, 1, 1, 1, 1, 2, 2, 0, 0]

Check if all list values in dataframe column are the same [duplicate]

If the type of a column in dataframe is int, float or string, we can get its unique values with columnName.unique().
But what if this column is a list, e.g. [1, 2, 3].
How could I get the unique of this column?
I think you can convert values to tuples and then unique works nice:
df = pd.DataFrame({'col':[[1,1,2],[2,1,3,3],[1,1,2],[1,1,2]]})
print (df)
col
0 [1, 1, 2]
1 [2, 1, 3, 3]
2 [1, 1, 2]
3 [1, 1, 2]
print (df['col'].apply(tuple).unique())
[(1, 1, 2) (2, 1, 3, 3)]
L = [list(x) for x in df['col'].apply(tuple).unique()]
print (L)
[[1, 1, 2], [2, 1, 3, 3]]
You cannot apply unique() on a non-hashable type such as list. You need to convert to a hashable type to do that.
A better solution using the latest version of pandas is to use duplicated() and you avoid iterating over the values to convert to list again.
df[~df.col.apply(tuple).duplicated()]
That would return as lists the unique values.

Replace indicator values with actual values

I have a numpy array like this
array([[0, 0, 1],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
and an array with values
array([1, 2, 3, 4])
I would like to replace the ones in the first two-dimensional array with the corresponding values in the second array. Each row of the first array has exactly one 1, and there is only 1 replacement in the second array.
Result:
array([[0, 0, 1],
[2, 0, 0],
[0, 3, 0],
[0, 0, 4]])
I would like an elegant solution to achieve this, without loops and such.
Let's say a is the 2D data array and b the second 1D array.
An elegant solution would be -
a[a==1] = b
For performance, leveraging the fact that there's exactly one 1 per row, we could also use indexing -
a[np.arange(len(a)),a.argmax(1)] = b
Selectively assign per row
If we want to selectively mask and asign values per row, we could use one more level of masking. So, let's say we have the rows to be selected as -
select_rows = np.array([1,3])
Then, we could do -
rowmask = np.isin(np.arange(len(a)),select_rows)
So, for the replacement for the first approach would be -
a[(a==1) & rowmask[:,None]] = b[rowmask]
And for the second one -
a[np.arange(len(a))[rowmask],a.argmax(1)[rowmask]] = b[rowmask]

How can I iterate through a DataFrame with a conditional to reorganize my data?

I have a DataFrame in the following format, and I would like to rearrange it based on a conditional using one of the columns of data.
My current DataFrame has the following format:
df.head()
Room Temp1 Temp2 Temp3 Temp4
R1 1 2 1 3
R1 2 3 2 4
R1 3 4 3 5
R2 1 1 2 2
R2 2 2 3 3
...
R15 1 1 1 1
I would like to 'pivot' this DataFrame to look like this:
Room
R1 = [1, 2, 3, 2, 3, 4, 1, 2, 3, 3, 4, 5]
R2 = [1, 2, 1, 2, 2, 3, 2, 3]
...
R15 = [1, 1, 1, 1,]
Where:
R1 = Temp1 + Temp2 + Temp3
So that:
R1 = [1, 2, 3, 2, 3, 4, 1, 2, 3, 3, 4, 5]
First: I have tried creating a list of each column using the 'where' conditional in which Room = 'R1'
room1 = np.where(df["Room"] == 'R1', df["Temp1"], 0).tolist()
It works, but I would need to do this individually for every column, of which there are many more than 4 in my other datasets.
Second: I tried to iterate through them:
i = ['Temp1', 'Temp2', 'Temp3', 'Temp4']
room1= []
for i in df[i]:
for row in df["Room"]:
while row == "R1":
...and this is where I get very lost. Where do I go next? How can I iterate through the rest of the columns and end up with the DataFrame I have above?
This should work (although it's not very efficient and will be slow on a big DataFrame):
results = {} # dict to store results
cols = ['Temp1', 'Temp2', 'Temp3', 'Temp4']
for r in df['Room'].unique():
room_list = []
sub_frame = df[df['Room'] == r]
for col in cols:
sub_col = sub_frame[col]
for val in sub_col:
room_list.append(val)
results[r] = room_list
results will be stored in the result dict, so you can access, say, R1 with:
results['R1']
Usually iterating over DataFrames is a bad idea though, I'm sure there's a better solution!
I found the answer!
The trick is to use the .pivot() function to rearrange the columns accordingly. I had an additonal column called 'Time' which I did not include in the original post thinking it was not relevant to the solution.
What I ended up doing is pivoting the table based on Columns and Values using index as the rooms:
df = df.pivot(index = "Room", columns = "Time", values = ["Temp1", "Temp2", "Temp3", "Temp4"]
Thank you to those who helped me on the way!

How to sort a list in Python using two attribute values?

Suppose I have a list named L and two attribute dictionaries named arr1 and arr2, whose keys are the elements of the list L. Now I want to sort L in the following manner.
L should be sorted in ascending order by virtue of the attribute values present in arr1.
If two elements i and j of L have same attribute arr1, i,e, if arr1[i] and arr[j] are equal, then we should look for the attribute values in arr2.
To give an example, suppose
L=[0,1,2,3,4,5,6]
arr1={0:30,1:15,2:15,3:20,4:23,5:20,6:35}
arr2={0:6,1:8,2:6,3:17,4:65,5:65,6:34}
Sorted L should be [2,1,3,5,4,0,6], ordering between 1 and 2 is decided by arr2, so does the ordering between 3 and 5. Rest of the ordering are decided by arr1.
Simply use a tuple with the values from arr1 and arr2 as sort key:
L.sort(key=lambda x: (arr1[x], arr2[x]))
# [2, 1, 3, 5, 4, 0, 6]
This is different from your expected result in the ordering of 5 and 3, which would be consistent if the sort order would be descending based on arr2:
L.sort(key=lambda x: (arr1[x], -arr2[x]))
# [1, 2, 5, 3, 4, 0, 6]
But now the ordering of 1, 2 is different, your example doesn't seem to be ordered in a consistent way.

Resources