Build a sequence of numbers - excel

My goal is to build a automatic coordinates file from a X and Y pitch for X and Y steps.
Let say,
X pitch = 2 (mm) Y pitch = 1 (mm)
X steps = 10 and Y steps = 10
My file should like something like this
1, 0, 0
2, 2, 0
3, 4, 0
4, 6, 0
5, 8, 0
6, 10, 0
7, 12, 0
8, 14, 0
9, 16, 0
10, 18, 0
11, 0, 1
etc
(till 100)
With the sequence function I managed too build to first column of numbers
=SEQUENCE(L1*L2;1;1;1) L1=Xsteps L2=Ysteps
Now I am struggeling to build the X and Y column
While X is repeating after every step for 10 times, Y is only incrementing every 10 times.
I would like to automate it, because in real life it's never a nice round number. But how?

This answer is maybe not complete, but I started with row:
1 0 0
=A1+1 =MOD(B1+2,20) =IF(MOD(A2,10)=0,C1+$D$1,C1)
This creates following list:
1 0 0 10308
2 2 0
3 4 0
4 6 0
5 8 0
6 10 0
7 12 0
8 14 0
9 16 0
10 18 10308
11 0 10308

If you have Excel 365, you can use Let with Sequence:
=LET(rows,Xsteps*Ysteps,
seq,SEQUENCE(rows,1,0),
column1,seq+1,
column2,MOD(seq,Xsteps)*Xpitch,
column3,INT(seq/Xsteps)*Ypitch,
CHOOSE({1,2,3},column1,column2,column3))
The four cells A2, B2, C2 and D2 are assigned names using the Name Manager based on the names in A1, B1, C1 and D1 respectively.

Related

Using Pandas to assign specific values

I have the following dataframe:
data = {'id': [1, 2, 3, 4, 5, 6, 7, 8],
'stat': ['ordered', 'unconfirmed', 'ordered', 'unknwon', 'ordered', 'unconfirmed', 'ordered', 'back'],
'date': ['2021', '2022', '2023', '2024', '2025','2026','2027', '1990']
}
df = pd.DataFrame(data)
df
I am trying to get the following data frame:
Unfortunate I am not successful so far and I used the following commands (for loops) for only stat==ordered:
y0 = np.zeros((len(df), 8), dtype=int)
y1 = [1990]
if stat=='ordered':
for i in df['id']:
for j in y1:
if df.loc[i].at['date'] in y1:
y0[i][y1.index(j)] = 1
else:
y0[i][y1.index(j)] = 0
But unfortunately it did not returned the expected solution and beside that it takes a very long time to do the calculation. I tried to use gruopby, but it could not fgure out either how to use it perporly since it is faster than using for loops. Any idea would be very appreiciated.
IIUC:
df.join(
pd.get_dummies(df.date).cumsum(axis=1).mul(
[1, 2, 1, 3, 1, 2, 1, 0], axis=0
).astype(int)
)
id stat date 1990 2021 2022 2023 2024 2025 2026 2027
0 1 ordered 2021 0 1 1 1 1 1 1 1
1 2 unconfirmed 2022 0 0 2 2 2 2 2 2
2 3 ordered 2023 0 0 0 1 1 1 1 1
3 4 unknwon 2024 0 0 0 0 3 3 3 3
4 5 ordered 2025 0 0 0 0 0 1 1 1
5 6 unconfirmed 2026 0 0 0 0 0 0 2 2
6 7 ordered 2027 0 0 0 0 0 0 0 1
7 8 back 1990 0 0 0 0 0 0 0 0

What am I doing wrong with series.replace()?

I am trying to replace integer values in pd.Series with other integer values as follows. I am using dict-like replace:
ser_list = [pd.Series([65, 1, 0, 0, 1]), pd.Series([0, 62, 1, 1, 0])]
for ser in ser_list:
ser.replace({65: 10, 62: 20})
I am expecting the result:
[10, 1, 0, 0, 1] # first series in the list
[0, 20, 1, 1, 0] # second series in the list
where 65 should be replaced with 10 in the first series, and 62 should be replaced with 20 in the second.
However, in with this code it is returning the original series without any replacement. Any clue why?
It is possible, by inplace=True:
for ser in ser_list:
ser.replace({65: 10, 62: 20}, inplace=True)
print (ser_list)
[0 10
1 1
2 0
3 0
4 1
dtype: int64, 0 0
1 20
2 1
3 1
4 0
dtype: int64]
But not recommended like mentioned #Dan in comments - link:
The pandas core team discourages the use of the inplace parameter, and eventually it will be deprecated (which means "scheduled for removal from the library"). Here's why:
inplace won't work within a method chain.
The use of inplace often doesn't prevent copies from being created, contrary to what the name implies.
Removing the inplace option would reduce the complexity of the pandas codebase.
Or assign to same variable in list comprehension:
ser_list = [ser.replace({65: 10, 62: 20}) for ser in ser_list]
Loop solution is possible with append to new list and assign back:
out = []
for ser in ser_list:
ser = ser.replace({65: 10, 62: 20})
out.append(ser)
print (out)
[0 10
1 1
2 0
3 0
4 1
dtype: int64, 0 0
1 20
2 1
3 1
4 0
dtype: int64]
We can also use Series.map with fillna and list comprehension:
new = [ser.map({65: 10, 62: 20}).fillna(ser) for ser in ser_list]
print(new)
[0 10.0
1 1.0
2 0.0
3 0.0
4 1.0
dtype: float64, 0 0.0
1 20.0
2 1.0
3 1.0
4 0.0
dtype: float64]

How can I merge data-frame rows by different columns

I have a DataFrame with 200k rows and some 50 columns with same id in different columns, looking like below:
df = pd.DataFrame({'pic': [1, 0, 0, 0, 2, 0, 3, 0, 0]
, 'story': [0, 1, 0, 2, 0, 0, 0, 0, 3]
, 'des': [0, 0, 1, 0, 0, 2, 0, 3, 0]
, 'some_another_value': [2, 1, 6, 5, 4, 3, 1, 1, 1]
, 'some_value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
pic story des some_another_value some_value
0 1 0 0 2 nan
1 0 1 0 nan 2
2 0 0 1 nan 3
3 0 2 0 nan 4
4 2 0 0 4 nan
5 0 0 2 nan 6
6 3 0 0 1 nan
7 0 0 3 nan 8
8 0 3 0 nan 9
I would like to merge the rows which have the same value in 'pic' 'story' 'des'
pic story des some_another_value some_value
0 1 1 1 2 5
3 2 2 2 4 10
6 3 3 3 1 17
How can this be achieved?
*I am looking for a solution which not contain a for loop
*Prefer not a sum method
I'm not sure why you say Prefer not a sum method when your expected output data clearly indicate sum. For your sample data, in each row, exactly one of pic, story, des is zero, so:
df.groupby(df[['pic','story', 'des']].sum(1)).sum()
gives
pic story des some_another_value some_value
1 1 1 1 2.0 5.0
2 2 2 2 4.0 10.0
3 3 3 3 1.0 17.0

vectorize groupby pandas

I have a dataframe like this:
day time category count
1 1 a 13
1 2 a 47
1 3 a 1
1 5 a 2
1 6 a 4
2 7 a 14
2 2 a 10
2 1 a 9
2 4 a 2
2 6 a 1
I want to group by day, and category and get a vector of the counts per time. Where time can be between 1 and 10. The max and min of time I have defined in two variables called max and min.
This is how I want the resulting dataframe to look:
day category count
1 a [13,47,1,0,2,4,0,0,0,0]
2 a [9,10,0,2,0,1,14,0,0,0]
Does anyone know how to make this aggregation into a vaector?
Use reindex with MultiIndex.from_product for append missing categories and then groupby with list:
df = df.set_index(['day','time', 'category'])
a = df.index.levels[0]
b = range(1,11)
c = df.index.levels[2]
df = df.reindex(pd.MultiIndex.from_product([a,b,c], names=df.index.names), fill_value=0)
df = df.groupby(['day','category'])['count'].apply(list).reset_index()
print (df)
day category count
0 1 a [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 a [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
EDIT:
df = (df.set_index(['day','time', 'category'])['count']
.unstack(1, fill_value=0)
.reindex(columns=range(1,11), fill_value=0))
print (df)
time 1 2 3 4 5 6 7 8 9 10
day category
1 a 13 47 1 0 2 4 0 0 0 0
2 a 9 10 0 2 0 1 14 0 0 0
df = df.apply(list, 1).reset_index(name='count')
print (df)
day ... count
0 1 ... [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 ... [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
[2 rows x 3 columns]

How to sum all the arrays inside a list of arrays?

I am working with the confusion matrix. So for each loop I have an array (confusion matrix). As I am doing 10 loops, I end up with 10 arrays. I want to sum all of them.
So I decided that for each loop I am going to store the arrays inside a list --I do not know whether it is better to store them inside an array.
And now I want to add each array which is inside the list.
So If I have:
5 0 0 1 1 0
0 5 0 2 4 0
0 0 5 2 0 5
The sum will be:
6 1 0
2 9 0
2 0 10
This is a picture of my confusion matrices and my list of arrays:
This is my code:
list_cm.sum(axis=0)
Just sum the list:
>>> sum([np.array([[5,0,0],[0,5,0],[0,0,5]]), np.array([[1,1,0],[2,4,0],[2,0,5]])])
array([[ 6, 1, 0],
[ 2, 9, 0],
[ 2, 0, 10]])

Resources