Converting Excel functions into R - excel

I have two excel functions that I am trying to convert into R:
numberShares
=IF(AND(N213="BOH",N212="BOH")=TRUE,P212,IF(AND(N213="BOH",N212="Sell")=TRUE,ROUNDDOWN(Q212/C213,0),0))
marketValue
=IF(AND(N212="BOH",N213="BOH")=TRUE,C213*P212,IF(AND(N212="Sell",N213="Sell")=TRUE,Q212,IF(AND(N212="BOH",N213="Sell")=TRUE,P212*C213,IF(AND(N212="Sell",N213="BOH")=TRUE,Q212))))
The cells that are referenced include:
c = closing price of a stock
n = position values of either "buy or hold" or "sell"
p = number of Shares
q = market value, assuming $10,000 initial equity (number of shares*closing price)
and the tops of the two output columns that i am trying to recreate look like this:
output
So far, in R I have constructed a dataframe with the necessary four columns:
data.frame
I just don't know how to write the functions that will populate the number of shares and market value columns. For loops? ifelse?
Again, thank you!!

Covert the AND()'s to infix "&"; the "=" to "=="; and the IF's to ifelse() and you are halfway there. The problem will be in converting your cell references to array or matrix references, and for that task we would have needed a better description of the data layout:
numberShares <-
ifelse( N213=="BOH" & N212=="BOH",
#Perhaps PosVal[213] == "BOH" & PosVal[212] == "BOH"
# ... and very possibly the 213 should be 213:240 and the 212 should be 212:239
P212,
ifelse( N213=="BOH" & N212=="Sell" ,
round(Q212/C213, digits=0),
0))
(You seem to be returning incommensurate values which seems preeety questionable.) Assuming this is correct code despite my misgivings the next translation involves apply the same substitutions in this structure (although you seem to be missing an else-consequent in the last IF function:
marketValue <-
IF( AND(N212="BOH", N213="BOH")=TRUE,
C213*P212,
IF(AND(N212="Sell",N213="Sell")=TRUE,
Q212,
IF( AND(N212="BOH",N213="Sell")=TRUE,
P212*C213,
IF(AND(N212="Sell",N213="BOH")=TRUE,
Q212))))
(Your testing for AND( .,.)=TRUE is I believe unnecessary in Excel and certainly unnecessary in R.)

Related

Python Warning Panda Dataframe "Simple Issue!" - "A value is trying to be set on a copy of a slice from a DataFrame"

first post / total Python novice so be patient with my slow understanding!
I have a dataframe containing a list of transactions by order of transaction date.
I've appended an additional new field/column called ["DB/CR"], that dependant on the presence of "-" in the ["Amount"] field populates 'Debit', else 'Credit' in the absence of "-".
Noting the transactions are in date order, I've included another new field/column called [Top x]. The output of which is I want to populate and incremental independent number (starting at 1) for both debits and credits on a segregated basis.
As such, I have created a simple loop with a associated 'if' / 'elif' (prob could use else as it's binary) statement that loops through the data sent row 0 to the last row in the df and using an if statement 1) "Debit" or 2) "Credit" increments the number for each independently by "Debit" 'i' integer, and "Credit" 'ii' integer.
The code works as expected in terms of output of the 'Top x'; however, I always receive a warning "A value is trying to be set on a copy of a slice from a DataFrame".
Trying to perfect my script, without any warnings I've been trying to understand what I'm doing incorrect but not getting it in terms of my use case scenario.
Appreciate if someone can kindly shed light on / propose how the code needs to be refactored to avoid receiving this error.
Code (the df source data is an imported csv):
#top x debits/credits
i = 0
ii = 0
for ind in df.index:
if df["DB/CR"][ind] == "Debit":
i = i+1
df["Top x"][ind] = i
elif df["DB/CR"][ind] == "Credit":
ii = ii+1
df["Top x"][ind] = ii
Interpreter
df["Top x"][ind] = i
G:\Finances Backup\venv\Statementsv.03.py:173: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Top x"][ind] = ii
Many thanks :)
You should use df.loc["DB/CR", ind] = "Debit"
Use iterrows() to iterate over the DF. However, updating DF while iterating is not preferable
see documentation here
Refer to the documentation here Iterrows()
You should never modify something you are iterating over. This is not
guaranteed to work in all cases. Depending on the data types, the
iterator returns a copy and not a view, and writing to it will have no
effect.

Is it possible to stack values from 4 different columns from two different sheets into one column in third sheet using formula? [duplicate]

I have a FLATTEN LAMBDA function that flattens data in an array. This works well, but I want to integrate another array argument so I can use non-contiguous ranges.
In my example, the range A1:B6 is housed in array and returns the flattened data.
How can I include an array2 argument that accepts D1:D6 as an additional range?
Formula:
FLATTEN =
LAMBDA(array,
LET(
rows,ROWS(array),
columns,COLUMNS(array),
sequence,SEQUENCE(rows*columns),
quotient,QUOTIENT(sequence-1,columns)+1,
mod,MOD(sequence-1,columns)+1,
INDEX(IF(array="","",array),quotient,mod)
)
)
Edit 7/4/22:
ms365 now has introduced a function called VSTACK() and TOCOL() which allows for the the functionality that we were missing from GS's FLATTEN() (and works even smoother)
In your case the formula could become:
=TOCOL(A1:D6,1)
And that small formula (where the 2nd parameter tells the function to ignore empty cells) would replace everything else from below here. If C1:C6 would hold values you don't want to incorporate you can try things like:
=VSTACK(TOCOL(A1:B6),D1:D6)
Previous Answer:
You can't really create a LAMBDA() with an unknown number (beforehand) of arrays to include in flatten. The fact that you have arrays of multiple columns will contribute to the "trickyness". One way to 'flatten' multiple columns in this specific way would be:
Formula in G1:
=LET(X,CHOOSE({1,2,3},A1:A6,B1:B6,D1:D6),Y,COLUMNS(X),Z,SEQUENCE(COUNTA(X)),INDEX(X,CEILING(Z/Y,1),MOD(Z-1,Y)+1))
EDIT: As per your comment, you can extend this as such:
=LET(X,CHOOSE({1,2,3},IF(A1:A6="","",A1:A6),IF(B1:B6="","",B1:B6),IF(D1:D6="","",D1:D6)),Y,COLUMNS(X),Z,SEQUENCE(ROWS(X)*Y),FLAT,INDEX(X,CEILING(Z/Y,1),MOD(Z-1,Y)+1),FILTER(FLAT,FLAT<>""))
It's a cheat, but:
FLATTEN =
LAMBDA(array,
LET(
rows,ROWS(array),
columns,COLUMNS(array),
sequence,SEQUENCE(rows*columns),
quotient,QUOTIENT(sequence-1,columns)+1,
mod,MOD(sequence-1,columns)+1,
unpiv, INDEX(array,quotient,mod),
FILTER(unpiv, unpiv<>"")
)
)
Where your array has been extended to A1:D6 as the input.
I think JvdV's answer will be the best depending on the input format
you want, but I had already written this out, so here goes...
You could do:
=LET( array1, A1:B6, array2, D1:D6,
rows1,ROWS(array1), rows2,ROWS(array2),
columns1,COLUMNS(array1), columns2,COLUMNS(array2),
rows, MIN(rows1, rows2),
columns, columns1 + columns2,
sequence,SEQUENCE(rows*columns),
quotient,QUOTIENT(sequence-1,columns)+1,
mod,MOD(sequence-1,columns)+1,
IFERROR(INDEX( IF( ISBLANK(array1),"",array1),quotient,mod),
INDEX(IF( ISBLANK(array2),"",array2),quotient,MOD(sequence-1,columns2)+1) )
)
It will take multi-column/row inputs to both arrays.
Starting from the article here and updating based upon observations about empty values in the arrays and allowing varying sized arrays we can get two formulae which you should be able to translate to Named LAMBDA functions for 'stacking' and 'shelving' arrays.
Stack Arrays
=LET(rngA, A1:C5, rngB, A9:D11,
rowsA, ROWS(rngA), rowsB, ROWS(rngB),
NumCols, MAX(COLUMNS(rngA), COLUMNS(rngB)),
SeqRow, SEQUENCE(rowsA + rowsB), SeqCol, SEQUENCE(1, NumCols),
Result, IF(SeqRow <= rowsA, INDEX(IF(rngA="","",rngA), SeqRow, SeqCol),
INDEX(IF(rngB="","",rngB), SeqRow-rowsA, SeqCol)),
arr, IFERROR(Result,""), arr)
Shelve Arrays
=LET(rngA, A1:C5, rngB, B8:D12,
colsA, COLUMNS(rngA), colsB, COLUMNS(rngB),
NumRows, MAX(ROWS(rngA), ROWS(rngB)),
SeqRow, SEQUENCE(NumRows), SeqCol, SEQUENCE(1, colsA + colsB),
Result, IF(SeqCol <= colsA, INDEX(IF(rngA="","",rngA), SeqRow, SeqCol),
INDEX(IF(rngB="","",rngB), SeqRow, SeqCol-colsA ) ),
arr, IFERROR(Result,""), arr)
Once you have a contiguous array, you can apply the formula you already have:
Updated to use a spill range for ease of testing...
=LET(data, A1#,
rows, ROWS(data), cols, COLUMNS(data),
seq, SEQUENCE(rows*cols,,0),
list, INDEX(IF(data="", "", data), QUOTIENT(seq, cols)+1, MOD(seq, cols)+1),
FILTER(list, LEN(list)>0))
This approach is really geared towards the named LAMBDA functions because otherwise you will end up with monstrous formulae and the other approaches may well be better in that case.

Dynamically filtering a Pandas DataFrame based on user input

I would appreciate suggestions for a more computationally efficient way to dynamically filter a Pandas DataFrame.
The size of the DataFrame, len(df.index), is around 680,000.
This code from the callback function of a Plotly Dash dashboard is triggered when points on a scatter graph are selected. These points are passed to points as a list of dictionaries containing various properties with keys 'A' to 'C'. This allows the user to select a subset of the data in the pandas.DataFrame instance df for cross-filtering analysis.
rows_boolean = pandas.Series([False] * len(df.index))
for point in points:
current_condition = ((df['A'] == point['a']) & (df['B'] == point['b'])
& (df['C'] >= point['c']) & (df['C'] < point['d']))
rows_boolean = rows_boolean | current_condition
filtered = df.loc[rows_boolean, list_of_column_names]
The body of this for loop is very slow as it is iterating over the whole data frame, it is manageable to run it once but not inside a loop.
Note that these filters are not additive, as in this example; each successive iteration of the for loop increases, rather than decreases, the size of filtered (as | rather than & operator is used).
Note also that I am aware of the existence of the method df['C'].between(point['c'], point['d']) as an alternative to the last two comparison operators, however, I only want this comparison to be inclusive at the lower end.
Solutions I have considered
Searching the many frustratingly similar posts on SO reveals a few ideas which get some of the way:
Using pandas.DataFrame.query() will require building a (potentially very large) query string as follows:
query = ' | '.join([f'((A == {point["a"]}) & (B == {point["b"]})
& (C >= {point["c"]}) & (C < {point["d"]}))' for point in points])
filtered = df.query(query)
My main concern here is that I don’t know how efficient the query method becomes when the query passed has several dozen (or even several hundred) conditions strung together. This solution also currently does not allow the selection of columns using list_of_column_names.
Another possible solution could come from implementing something like this.
To reiterate, speed is key here, so I'm not just after something that works, but something that works a darn sight faster than my boolean implementation above:
There should be one-- and preferably only one --obvious way to do it. (PEP 20)

selecting all cells between two string in a column

I posted question previously as "using “.between” for string values not working in python" and I was not clear enough, but I could not edit, so I am reposting with clarity here.
I have a Data Frame. In [0,61] I have string. In [0,69] I have a string. I want to slice all the data in cells [0,62:68] between these two and merge them, and paste the result into [1,61]. Subsequently, [0,62:68] will be blank, but that is not important.
However, I have several hundred documents, and I want to write a script that executes on all of them. The strings in [0,61] and [0,69] are always present in all the documents, but along different locations in that column. So I tried using:
For_Paste = df[0][df[0].between('DESCRIPTION OF WORK / STATEMENT OF WORK', 'ADDITIONAL REQUIREMENTS / SUPPORTING DOCUMENTATION', inclusive = False)]
But the output I get is: Series([], Name: 0, dtype: object)
I was expecting a list or array with the desired data that I could merge and paste. Thanks.
enter image description here
If you want to select the rows between two indices (say idx_start and idx_end), excluding these two rows) on column col of the dataframe df, you will want to use
df.loc[idx_start + 1 : idx_end, col]
To find the first index matching a string s, use
idx = df.index[df[col] == s][0]
So for your case, to return a Series of the rows between these two indices, try the following:
start_string = 'DESCRIPTION OF WORK / STATEMENT OF WORK'
end_string = 'ADDITIONAL REQUIREMENTS / SUPPORTING DOCUMENTATION'
idx_start = df.index[df[0] == start_string][0]
idx_end = df.index[df[0] == end_string][0]
For_Paste = df.loc[idx_start + 1 : idx_end, 0]

How can I rename a group of variables in a loop in MATLAB?

I have imported a matrix X filled with data, and its according headers for each column into MATLAB. Now the problem is how can I rename each column of X by its according name in the header cell.I would like to do this in a loop.
Would anyone tell me how can I loop a rename programme in this situation?
I suggest creating a structure out of the data, rather than individual variables. Even with a large number of columns, this will not clutter the workspace, nor will it overwrite variables already in the workspace in the case of a name collision. It will keep all the data from the spreadsheet together, and still allowing access to it by column name. To easily create a structure from a cell array of column names and a matrix of data, use cell2struct:
>> colnames = {'odds','evens'};
>> data = [1 2;3 4;5 6];
>> spreadsheet_structure = cell2struct(num2cell(data,1), colnames, 2)
spreadsheet_structure =
odds: [3x1 double]
evens: [3x1 double]
(num2cell(M,1) creates a cell array in which each cell is a column from matrix M)
Loop through the header columns and use eval to create variables with names contained as strings in your matrix "header":
[X,header,~] = xlsread('eaef21.xls',1,'A1:AY541');
for H = 1:size(header,2)
eval([header(1,H), " = X(:,", H, ");"]);
end
Also it is often very useful to replace the eval above with disp until you are satisfied that it is working as you want it to. Using disp will help you understand what is going on as well.

Resources