mapping values from one array to another - python-3.x

I have an array, arr1, which is a pd.Series array of length 1000 where some values are repeated. And I want to map every unique value in arr1 to a new value that is in a np array, arr2. I only know how to do this using a for loop:
import numpy as np
import pandas as pd
arr1 = pd.Series(np.random.choice(1000,1000, replace=True))
arr1_unq = arr1.drop_duplicates()
arr2 = np.random.choice(1000,len(arr1_unq), replace=False)
arr2_unq = np.unique(arr2)
for i in range(len(arr2)):
arr1[arr1==arr1_unq.iloc[i]]=arr2[i]
How can I do this more efficiently without using a for loop?

pandas.Series.map should do it
mapping = dict(zip(arr1_unq, arr2_unq))
arr1.map(maping)

Related

How to make multiple ranges in a array in python?

I am looking for a single vector with values [(0:400) (-400:-1)]
Can anyone help me on how to write this in python.
Using Numpy .array to create the vector and .arange to generate the range:
import numpy as np
arr = np.array([[np.arange(401)], [np.arange(-400, 0)]], dtype=object)

Plotting Pandas DF with Numpy Arrays

I have a Pandas df with multiple columns and each cell inside has a various number of elements of a Numpy array. I would like plot all the elements of the array for every cell within column.
I have tried
plt.plot(df['column'])
plt.plot(df['column'][0:])
both gives a ValueErr: setting an array element with a sequence
It is very important that these values get plotted to its corresponding index as the index represents linear time in this dataframe. I would really appreciate it if someone showed me how to do this properly. Perhaps there is a package other than matplotlib.pylot that is better suited for this?
Thank you
plt.plot needs a list of x-coordinates together with an equally long list of y-coordinates. As you seem to want to use the index of the dataframe for the x-coordinate and each cell contents for the y-coordinates, you need to repeat the x-values as many times as the length of the y-coordinates.
Note that this format doesn't suit a line plot, as connecting subsequent points would create some strange vertical lines. plt.plot accepts a marker as its third parameter, for example '.' to draw a simple dot at each position.
A code example:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
N = 30
df = pd.DataFrame({f'column{c}':
[np.random.normal(np.random.uniform(10, 100), 1, np.random.randint(3, 11)) for _ in range(N)]
for c in range(1, 6)})
legend_handles = []
colors = plt.cm.Set1.colors
desired_columns = df.columns
for column, color in zip(desired_columns, colors):
for ind, cell in df[column].iteritems():
if len(cell) > 0:
plotted, = plt.plot([ind] * len(cell), cell, '.', color=color)
legend_handles.append(plotted)
plt.legend(legend_handles, desired_columns)
plt.show()
Note that pandas really isn't meant to store complete arrays inside cells. The preferred way is to create a dataframe in "long" form, with each value in a separate row (with the "index" repeated). Most functions of pandas and seaborn don't understand about arrays inside cells.
Here's a way to create a long form which can be called using Seaborn:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
N = 30
df = pd.DataFrame({f'column{c}':
[np.random.normal(np.random.uniform(10, 100), 1, np.random.randint(3, 11)) for _ in range(N)]
for c in range(1, 6)})
desired_columns = df.columns
df_long_data = []
for column in desired_columns:
for ind, cell in df[column].iteritems():
for val in cell:
dict = {'timestamp': ind, 'column_name': column, 'value': val}
df_long_data.append(dict)
df_long = pd.DataFrame(df_long_data)
sns.scatterplot(x='timestamp', y='value', hue='column_name', data=df_long)
plt.show()
As per your problem, you have numpy arrays in each cell which you wanna plot. To pass your data to plt.plot() method you might need to pass every cell individually as whenever you try to pass it as a whole like you did, it is actually a sequence that you are passing. But the plot() method will accept a numpy array.
This might help:
for column in df.columns:
for cell in df[column]:
plt.plot(cell)
plt.show()

How can I use Pandas to write a table using different lists as header and index and body to Excel?

I should create a Table that first column is a string, that the first one is 'Filename' and the rest is a list of the filenames in a directory (dimension n).
Then I should have a header row that contains a list that for each column indicates the name of that column (length of this list is m), and then between all this there is a list of float numbers that will be m x n.
How should I do this? In the most tutorials, there is only one of headers or the first column a list.
Here is what I have done. I add that the variable Filenames contains the strings that are the name of the files and casename is the first row and minmax is the m x n matrix of floats:
import os
import pandas as pd
import numpy as np
import openpyxl as pyx
import numpy as np
from collections import OrderedDict
maxmin = [1,2,3,4,5,6,11,22,33,44,55,66,111,222,333,444,555,666]
Filenames=['A','B','C']
casename=['Xmax','Xmin','Ymax','Ymin','Zmax','Zmin']
Body = OrderedDict([ ('Filename', casename),(Filenames, maxmin) ] )
df = pd.DataFrame.from_dict(Body)

A colon before a number in numpy array

I'm using a camera to store raw data in a numpy array, but I don't know What does mean a colon before a number in numpy array?
import numpy as np
import picamera
camera = picamera.PiCamera()
camera.resolution = (128, 112)
data = np.empty((128, 112, 3), dtype=np.uint8)
camera.capture(data, 'rgb')
data = data[:128, :112]
numpy array indexing is explained in the doc.
this example shows what is selected:
import numpy as np
data = np.arange(64).reshape(8, 8)
print(data)
data = data[:3, :5]
print(data)
the result will be the first 5 elements of the first 3 rows of the array.
as in standard python lst[:3] means everything up to the third element (i.e. the element with index < 3). in numpy you can do the same for every dimension with the syntax given in your question.

combination of numpy array satisfying condition

I am trying to produce all combination of numpy array that satisfy a condition efficiently my code now looks like this
import numpy as np
import itertools
a = np.array([1,11,12,13])
a = np.tile(a,(13,1))
a = a.flatten()
for c in itertools.combinations(a,4):
if np.sum(c)==21:
print(c)
If you only care about unique combinations (and there are only 256 of them), you can use itertools.product:
version_1 = np.vstack(list(sorted({tuple(row) for row in list(itertools.combinations(a, 4))}))) # unique combinations, your way
version_2 = np.array(list(itertools.product((1, 11, 12, 13), repeat=4))) # same result, but faster
assert (version_1 == version_2).all()
I'm using this answer to get the unique elements of a Numpy array.
So the final answer would be:
import itertools, numpy as np
a = np.array(list(itertools.product((1, 11, 12, 13), repeat=4)))
for arr in a[a.sum(axis=1) == 21]:
print(arr)

Resources