first post / total Python novice so be patient with my slow understanding!
I have a dataframe containing a list of transactions by order of transaction date.
I've appended an additional new field/column called ["DB/CR"], that dependant on the presence of "-" in the ["Amount"] field populates 'Debit', else 'Credit' in the absence of "-".
Noting the transactions are in date order, I've included another new field/column called [Top x]. The output of which is I want to populate and incremental independent number (starting at 1) for both debits and credits on a segregated basis.
As such, I have created a simple loop with a associated 'if' / 'elif' (prob could use else as it's binary) statement that loops through the data sent row 0 to the last row in the df and using an if statement 1) "Debit" or 2) "Credit" increments the number for each independently by "Debit" 'i' integer, and "Credit" 'ii' integer.
The code works as expected in terms of output of the 'Top x'; however, I always receive a warning "A value is trying to be set on a copy of a slice from a DataFrame".
Trying to perfect my script, without any warnings I've been trying to understand what I'm doing incorrect but not getting it in terms of my use case scenario.
Appreciate if someone can kindly shed light on / propose how the code needs to be refactored to avoid receiving this error.
Code (the df source data is an imported csv):
#top x debits/credits
i = 0
ii = 0
for ind in df.index:
if df["DB/CR"][ind] == "Debit":
i = i+1
df["Top x"][ind] = i
elif df["DB/CR"][ind] == "Credit":
ii = ii+1
df["Top x"][ind] = ii
Interpreter
df["Top x"][ind] = i
G:\Finances Backup\venv\Statementsv.03.py:173: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Top x"][ind] = ii
Many thanks :)
You should use df.loc["DB/CR", ind] = "Debit"
Use iterrows() to iterate over the DF. However, updating DF while iterating is not preferable
see documentation here
Refer to the documentation here Iterrows()
You should never modify something you are iterating over. This is not
guaranteed to work in all cases. Depending on the data types, the
iterator returns a copy and not a view, and writing to it will have no
effect.
I have a FLATTEN LAMBDA function that flattens data in an array. This works well, but I want to integrate another array argument so I can use non-contiguous ranges.
In my example, the range A1:B6 is housed in array and returns the flattened data.
How can I include an array2 argument that accepts D1:D6 as an additional range?
Formula:
FLATTEN =
LAMBDA(array,
LET(
rows,ROWS(array),
columns,COLUMNS(array),
sequence,SEQUENCE(rows*columns),
quotient,QUOTIENT(sequence-1,columns)+1,
mod,MOD(sequence-1,columns)+1,
INDEX(IF(array="","",array),quotient,mod)
)
)
Edit 7/4/22:
ms365 now has introduced a function called VSTACK() and TOCOL() which allows for the the functionality that we were missing from GS's FLATTEN() (and works even smoother)
In your case the formula could become:
=TOCOL(A1:D6,1)
And that small formula (where the 2nd parameter tells the function to ignore empty cells) would replace everything else from below here. If C1:C6 would hold values you don't want to incorporate you can try things like:
=VSTACK(TOCOL(A1:B6),D1:D6)
Previous Answer:
You can't really create a LAMBDA() with an unknown number (beforehand) of arrays to include in flatten. The fact that you have arrays of multiple columns will contribute to the "trickyness". One way to 'flatten' multiple columns in this specific way would be:
Formula in G1:
=LET(X,CHOOSE({1,2,3},A1:A6,B1:B6,D1:D6),Y,COLUMNS(X),Z,SEQUENCE(COUNTA(X)),INDEX(X,CEILING(Z/Y,1),MOD(Z-1,Y)+1))
EDIT: As per your comment, you can extend this as such:
=LET(X,CHOOSE({1,2,3},IF(A1:A6="","",A1:A6),IF(B1:B6="","",B1:B6),IF(D1:D6="","",D1:D6)),Y,COLUMNS(X),Z,SEQUENCE(ROWS(X)*Y),FLAT,INDEX(X,CEILING(Z/Y,1),MOD(Z-1,Y)+1),FILTER(FLAT,FLAT<>""))
It's a cheat, but:
FLATTEN =
LAMBDA(array,
LET(
rows,ROWS(array),
columns,COLUMNS(array),
sequence,SEQUENCE(rows*columns),
quotient,QUOTIENT(sequence-1,columns)+1,
mod,MOD(sequence-1,columns)+1,
unpiv, INDEX(array,quotient,mod),
FILTER(unpiv, unpiv<>"")
)
)
Where your array has been extended to A1:D6 as the input.
I think JvdV's answer will be the best depending on the input format
you want, but I had already written this out, so here goes...
You could do:
=LET( array1, A1:B6, array2, D1:D6,
rows1,ROWS(array1), rows2,ROWS(array2),
columns1,COLUMNS(array1), columns2,COLUMNS(array2),
rows, MIN(rows1, rows2),
columns, columns1 + columns2,
sequence,SEQUENCE(rows*columns),
quotient,QUOTIENT(sequence-1,columns)+1,
mod,MOD(sequence-1,columns)+1,
IFERROR(INDEX( IF( ISBLANK(array1),"",array1),quotient,mod),
INDEX(IF( ISBLANK(array2),"",array2),quotient,MOD(sequence-1,columns2)+1) )
)
It will take multi-column/row inputs to both arrays.
Starting from the article here and updating based upon observations about empty values in the arrays and allowing varying sized arrays we can get two formulae which you should be able to translate to Named LAMBDA functions for 'stacking' and 'shelving' arrays.
Stack Arrays
=LET(rngA, A1:C5, rngB, A9:D11,
rowsA, ROWS(rngA), rowsB, ROWS(rngB),
NumCols, MAX(COLUMNS(rngA), COLUMNS(rngB)),
SeqRow, SEQUENCE(rowsA + rowsB), SeqCol, SEQUENCE(1, NumCols),
Result, IF(SeqRow <= rowsA, INDEX(IF(rngA="","",rngA), SeqRow, SeqCol),
INDEX(IF(rngB="","",rngB), SeqRow-rowsA, SeqCol)),
arr, IFERROR(Result,""), arr)
Shelve Arrays
=LET(rngA, A1:C5, rngB, B8:D12,
colsA, COLUMNS(rngA), colsB, COLUMNS(rngB),
NumRows, MAX(ROWS(rngA), ROWS(rngB)),
SeqRow, SEQUENCE(NumRows), SeqCol, SEQUENCE(1, colsA + colsB),
Result, IF(SeqCol <= colsA, INDEX(IF(rngA="","",rngA), SeqRow, SeqCol),
INDEX(IF(rngB="","",rngB), SeqRow, SeqCol-colsA ) ),
arr, IFERROR(Result,""), arr)
Once you have a contiguous array, you can apply the formula you already have:
Updated to use a spill range for ease of testing...
=LET(data, A1#,
rows, ROWS(data), cols, COLUMNS(data),
seq, SEQUENCE(rows*cols,,0),
list, INDEX(IF(data="", "", data), QUOTIENT(seq, cols)+1, MOD(seq, cols)+1),
FILTER(list, LEN(list)>0))
This approach is really geared towards the named LAMBDA functions because otherwise you will end up with monstrous formulae and the other approaches may well be better in that case.
I posted question previously as "using “.between” for string values not working in python" and I was not clear enough, but I could not edit, so I am reposting with clarity here.
I have a Data Frame. In [0,61] I have string. In [0,69] I have a string. I want to slice all the data in cells [0,62:68] between these two and merge them, and paste the result into [1,61]. Subsequently, [0,62:68] will be blank, but that is not important.
However, I have several hundred documents, and I want to write a script that executes on all of them. The strings in [0,61] and [0,69] are always present in all the documents, but along different locations in that column. So I tried using:
For_Paste = df[0][df[0].between('DESCRIPTION OF WORK / STATEMENT OF WORK', 'ADDITIONAL REQUIREMENTS / SUPPORTING DOCUMENTATION', inclusive = False)]
But the output I get is: Series([], Name: 0, dtype: object)
I was expecting a list or array with the desired data that I could merge and paste. Thanks.
enter image description here
If you want to select the rows between two indices (say idx_start and idx_end), excluding these two rows) on column col of the dataframe df, you will want to use
df.loc[idx_start + 1 : idx_end, col]
To find the first index matching a string s, use
idx = df.index[df[col] == s][0]
So for your case, to return a Series of the rows between these two indices, try the following:
start_string = 'DESCRIPTION OF WORK / STATEMENT OF WORK'
end_string = 'ADDITIONAL REQUIREMENTS / SUPPORTING DOCUMENTATION'
idx_start = df.index[df[0] == start_string][0]
idx_end = df.index[df[0] == end_string][0]
For_Paste = df.loc[idx_start + 1 : idx_end, 0]
I have imported a matrix X filled with data, and its according headers for each column into MATLAB. Now the problem is how can I rename each column of X by its according name in the header cell.I would like to do this in a loop.
Would anyone tell me how can I loop a rename programme in this situation?
I suggest creating a structure out of the data, rather than individual variables. Even with a large number of columns, this will not clutter the workspace, nor will it overwrite variables already in the workspace in the case of a name collision. It will keep all the data from the spreadsheet together, and still allowing access to it by column name. To easily create a structure from a cell array of column names and a matrix of data, use cell2struct:
>> colnames = {'odds','evens'};
>> data = [1 2;3 4;5 6];
>> spreadsheet_structure = cell2struct(num2cell(data,1), colnames, 2)
spreadsheet_structure =
odds: [3x1 double]
evens: [3x1 double]
(num2cell(M,1) creates a cell array in which each cell is a column from matrix M)
Loop through the header columns and use eval to create variables with names contained as strings in your matrix "header":
[X,header,~] = xlsread('eaef21.xls',1,'A1:AY541');
for H = 1:size(header,2)
eval([header(1,H), " = X(:,", H, ");"]);
end
Also it is often very useful to replace the eval above with disp until you are satisfied that it is working as you want it to. Using disp will help you understand what is going on as well.