How to update MultiIndex after dropping columns in a Dataframe? - python-3.x

I cannot find how to update a column MultiIndex with dead values after columns have been dropped. I have read documentation, but I do not find a method that fit my needs.
MWE:
MultiIndex(levels=[['41B001', '41B004', '41B011', '41MEU1', '41N043', '41R001', '41R002', '41R012', '41WOL1'], ['CO', 'NO', 'NO2', 'O3', 'PM-10.0', 'PM-2.5', 'SO2']],
labels=[[1, 1, 2, 2, 4, 4, 5, 5, 7, 7, 8, 8], [2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3]],
names=['sitekey', 'measurandkey'])
Keys 41B001, 41MEU1, 41R001 and CO, NO, PM-10.0, PM-2.5, SO2 have been removed but there are still referenced in MultiIndex. I would like a new index without those labels.
Following command does the trick, but it do not find it clean:
data3.T.reset_index().set_index(['sitekey', 'measurandkey']).index
Returns what I expect:
MultiIndex(levels=[['41B004', '41B011', '41N043', '41R001', '41R012', '41WOL1'], ['NO2', 'O3']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
names=['sitekey', 'measurandkey'])
Is there a better way (efficient, cleaner, pythonic, pandas friendly) to achieve this working only on MultiIndex instead of transposing the DataFrame?

Related

Pytorch: Most computationally and memory efficient way to make a series of concatenations from extracting tensor rows?

Say that this is my sample tensor
sample = torch.tensor(
[[2, 7, 3, 1, 1],
[9, 5, 8, 2, 5],
[0, 4, 0, 1, 4],
[5, 4, 9, 0, 0]]
)
I want to have a new tensor, which will consist of concatenations of 2 rows from the sample tensor.
So I have a tensor which contains pairs of the row numbers that I want concatenated into a single row for the new tensor
cat_indices = torch.tensor([[0, 1], [1, 2], [0, 2], [2, 3]])
The current method I am using is this
torch.cat((sample[cat_indices[:,0]], sample[cat_indices[:,1]]), dim=1)
Which gives the desired result
tensor([[2, 7, 3, 1, 1, 9, 5, 8, 2, 5],
[9, 5, 8, 2, 5, 0, 4, 0, 1, 4],
[2, 7, 3, 1, 1, 0, 4, 0, 1, 4],
[0, 4, 0, 1, 4, 5, 4, 9, 0, 0]])
Is this the most memory and computationally efficient method of doing this? I am not sure because I am making two calls to cat_indices, and then I am doing a concatenation operation.
I feel that there should be a way to do this via some sort of view. Perhaps advanced indexing. I've tried things like sample[cat_indices[:,0], cat_indices[:,1]] or sample[cat_indices[0], cat_indices[1]] but I can't make the view come out right.
What you have should be pretty fast. An alternative is
sample[cat_indices].reshape(cat_indices.shape[0],-1)
You would have to benchmark the performance on your machine though to see which is better.

Data Standarisation of a list of list in Python

I want a standardized list of lists. I have the main list
Data=[[5, 3, 2, 8, 5, 10, 8, 1, 2],
[5, 1, 1, 1, 2, 1, 1, 1, 1],
[1, 1, 1, 1, 4, 3, 1, 1, 1],
[1, 1, 1, 1, 2, 2, 1, 1, 1],
[2, 1, 1, 1, 2, 1, 3, 1, 1],
[1, 1, 1, 2, 1, 1, 1, 1, 1],
[8, 10, 3, 2, 6, 4, 3, 10, 1]]
I have calculated the list of column averages and a list of Standard Deviation for 9 columns.
col_avg=[4.47,3.14,3.10,2.67, 3.25,3.83,3.16,2.99, 1.60]
col_std=[2.86,2.98,2.77,2.76,2.13,3.77,2.17, 3.16,1.67]
Now I want a list of lists where the elements are standardized.
The formula: (x-col_avg)/col_std
I don't want to use any python packages like Pandas and Numpy. It should be in general python.
At the risk of reiterating what #ScootCork said, I had the same solution, although using an outer for loop. The function enumerate gets the index for every element, which is used to look up the column in your pre-computed lists.
out = []
for i, col in enumerate(Data):
standardized_col = [(x - col_avg[i]) / col_std[i] for x in col]
out.append(standardized_col)
print(out)
You could do so with nested list comprehension, in combination with enumerate.
[[(e - col_avg[i]) / col_std[i] for i, e in enumerate(line)] for line in Data]
Basically you loop over Data, for each line you loop over the elements together with the index (from enumerate). With this index you can then 'lookup' the relevant avg and std values.

Calculating the normed distances between two points in numpy array

I'm hoping to calculate the distances between two points in a (Nx1) numpy array, i.e.:
a = [2, 5, 5, 12, 5, 3, 10, 8, 1, 3, 1]
I'm hoping to get a square matrix with the (normed) distances between each point:
sq = [[0, |2-5|, |2-5|, |2-12|, |2-5|, ...],
[|5-2|, 0, ...], ...]
So far, what I have doesn't work, giving wrong values for the square distance matrix. Is there a way to (I'm not sure if it is the correct term?) vectorise my method too, but am unfamiliar with the advanced indexing.
What I currently have is the following:
sq = np.zero((len(a), len(a))
for i in a:
for j in len(a+1):
sq[i,j] = np.abs(a[:,0] - a[:,0])
Would appreciate any help!
I think that by exploiting numpy broadcasting, this is the faster solution:
a = [2, 5, 5, 12, 5, 3, 10, 8, 1, 3, 1]
a = np.array(a).reshape(-1,1)
sq = np.abs(a.T-a)
sq
array([[ 0, 3, 3, 10, 3, 1, 8, 6, 1, 1, 1],
[ 3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4],
[ 3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4],
[10, 7, 7, 0, 7, 9, 2, 4, 11, 9, 11],
[ 3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4],
[ 1, 2, 2, 9, 2, 0, 7, 5, 2, 0, 2],
[ 8, 5, 5, 2, 5, 7, 0, 2, 9, 7, 9],
[ 6, 3, 3, 4, 3, 5, 2, 0, 7, 5, 7],
[ 1, 4, 4, 11, 4, 2, 9, 7, 0, 2, 0],
[ 1, 2, 2, 9, 2, 0, 7, 5, 2, 0, 2],
[ 1, 4, 4, 11, 4, 2, 9, 7, 0, 2, 0]])
With numpy the following line might be the shortest to your result:
import numpy as np
a = np.array([2, 5, 5, 12, 5, 3, 10, 8, 1, 3, 1])
sq = np.array([np.array([(np.abs(i - j)) for j in a]) for i in a])
print(sq)
The following would give you the desired result without numpy.
a = [2, 5, 5, 12, 5, 3, 10, 8, 1, 3, 1]
sq = []
for i in a:
distances = []
for j in a:
distances.append(abs(i-j))
sq.append(distances)
print(sq)
With both, the result comes as:
[[0, 3, 3, 10, 3, 1, 8, 6, 1, 1, 1], [3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4], [3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4], [10, 7, 7, 0, 7, 9, 2, 4, 11, 9, 11], [3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4], [1, 2, 2, 9, 2, 0, 7, 5, 2, 0, 2], [8, 5, 5, 2, 5, 7, 0, 2, 9, 7, 9], [6, 3, 3, 4, 3, 5, 2, 0, 7, 5, 7], [1, 4, 4, 11, 4, 2, 9, 7, 0, 2, 0], [1, 2, 2, 9, 2, 0, 7, 5, 2, 0, 2], [1, 4, 4, 11, 4, 2, 9, 7, 0, 2, 0]]
There may be more than one way to do this but one way is to only use numpy operations instead of loops because internally python does lots of optimizations for numpy arrays.
One way to do only using array operations is to create an NxN matrix by repeating the original matrix (a) N times.
This will create a matrix N times.
E.g:
a = [1, 2, 3]
b = [[1 , 2, 3], [1 , 2, 3], [1 , 2, 3]]
Then you can do a matrix, array operation of
ans = abs(b - a)
Assuming a is numpy array, you can do:
b = np.repeat(a,a.shape).reshape((a.shape[0],a.shape[0]))
ans = np.abs(b - a)

numpy remove column by different value in batch

I want to ask how numpy remove columns in batch by list.
The value in the list corresponds to the batch is different from each other.
I know this problem can use the for loop to solve, but it is too slow ...
Can anyone give me some idea to speed up?
array (batch size = 3):
[[0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6]]
remove index in the list (batch size = 3)
[[2, 3, 4], [1, 2, 6], [0, 1, 5]]
output:
[[0, 1, 5, 6], [0, 3, 4, 5], [2, 3, 4, 6]]
Assuming the array is 2d, and the indexing removes equal number of elements per row, we can remove items with a boolean mask:
In [289]: arr = np.array([[0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6]]
...: )
In [290]: idx = np.array([[2, 3, 4], [1, 2, 6], [0, 1, 5]])
In [291]: mask = np.ones_like(arr, dtype=bool)
In [292]: mask[np.arange(3)[:,None], idx] = False
In [293]: arr[mask]
Out[293]: array([0, 1, 5, 6, 0, 3, 4, 5, 2, 3, 4, 6])
In [294]: arr[mask].reshape(3,-1)
Out[294]:
array([[0, 1, 5, 6],
[0, 3, 4, 5],
[2, 3, 4, 6]])

How to make a new matrix from another matrix by using its first column as new's first row?

I want to use a python function or library - if any - for creating a new matrix whose first row beginning from the right-below is created by using old matrix's first column beginning from the left-top. That matrix can have different columns and rows but of course my new matrix have to have same dimension as previous one. My will is something like that:
In keeping with the brief style of the question:
In [467]: alist = [5,6,4,3,4,5,3,2,5,3,1,2,2,3,2,1,3,1,1,1]
In [468]: arr = np.array(alist).reshape(4,5)
In [469]: arr
Out[469]:
array([[5, 6, 4, 3, 4],
[5, 3, 2, 5, 3],
[1, 2, 2, 3, 2],
[1, 3, 1, 1, 1]])
In [470]: arr.reshape(5,4)
Out[470]:
array([[5, 6, 4, 3],
[4, 5, 3, 2],
[5, 3, 1, 2],
[2, 3, 2, 1],
[3, 1, 1, 1]])
In [471]: arr.reshape(5,4,order='F')
Out[471]:
array([[5, 3, 2, 1],
[5, 2, 1, 4],
[1, 3, 3, 3],
[1, 4, 5, 2],
[6, 2, 3, 1]])
In [473]: np.rot90(_)
Out[473]:
array([[1, 4, 3, 2, 1],
[2, 1, 3, 5, 3],
[3, 2, 3, 4, 2],
[5, 5, 1, 1, 6]])

Resources