using boolean array indexing in numpy causes ValueError - python-3.x

I was trying out indexing using boolean arrays
def boolean_array_indexing_each_dim1():
a = np.array(
[
['a','b','c','d'],
['e','f','g','h'],
['i','j','k','l'],
['m','n','o','p']
]
)
print('a=\n',a.shape,'\n',a)
b1 = np.array([True,True,True,False]) #gives error
#b1 = np.array([True,False,True,False]) #works
print('b1=\n',b1.shape,'\n',b1)
b2 = np.array([True,False,True,False])
print('b2=\n',b2.shape,'\n',b2)
selected = a[b1,b2]
print('selected=\n',selected.shape,'\n',selected)
the array b1 = np.array([True,True,True,False]) causes a 'ValueError shape mismatch: objects cannot be broadcast to a single shape'
The array b1 = np.array([True,False,True,False]) however works and produces a result ' ['a' 'k']'
why does this error happen? can someone please tell ?

The reason is your first b1 array has 3 True values and the second one has 2 True values. These are equivalent to indexing by [0,1,2], [0,2] respectively. Numpy's indexing "works" by constructing pairs of indexes from the sequence of positions in the b1 and b2 arrays. For the case of [0,1,2], [0,2] it constructs index pairs (0,0), (1,2) but then there's no partner for the final 2 in b1, so it raises ValueError. Your alternate b1 works because it happens to have the same number of True values as your b2.
I suspect what you intended to accomplish is
selected = a[b1,:][:,b2]
This would consistently slice the array with b1 along axis 0, and then slice it with b2 along axis 1.

Related

Conditional branches of a function raise false circle reference errors

I have a big function with lots of branches of IFS; each branch uses different references of the worksheet:
MYFUN = LAMBDA(i,
IFS(
i = 1, // a formula uses Row 1 for instance,
i = 2, // a formula uses Row 2 for instance,
... ...
)
Then, I realize that if I write =MYFUN(2) at Cell C1, a circle reference error is raised, even though during runtime =MYFUN(2) does not use values in Row 1.
I try to reproduce the problem with a small code. I define a function as follows:
TRY = LAMBDA(i,
IFS(
i = 1, Sheet1!$B$2,
i = 2, Sheet1!$D$2,
TRUE, "haha"
)
);
Then, writing =TRY(2) at Cell B2 returns well the value of D2 without the error of circle references, which is good.
Now, I add a SUM function as follows:
TRY = LAMBDA(i,
IFS(
i = 1, SUM(Sheet1!$B$2),
i = 2, Sheet1!$D$2,
TRUE, "haha"
)
);
Now, writing =TRY(2) at Cell B2 raises a circle reference error, even though SUM(Sheet1!$B$2) does not need to be executed.
Does anyone know why the behaviour is like that?
How could I restructure the code like in MYFUN to avoid false circle reference errors?
PS:
I also realize that ROWS(Sheet1!$B$2) at the place of SUM(Sheet1!$B$2) does not raise circle reference error. So what's their semantics?
IFS will try to resolve every criterion then return the one associated with the first TRUE. It does not find the first TRUE then resolve the Criterion associated.
Put 6 in B2 and 3 in D2 then put this in B4
=IFS(2=1,SUM(B2),2=2,D2,TRUE,1=1)
Then with that cell selected evaluate the formula:
We can see that all 6 criteria were resolved, but the 3 was returned.

Appending a value to a specific DataFrame cell

I have tested many different options to append a value to a certain cell in a dataframe, but couldn't figure out yet how to do it, nor have found any relevant on my researches.
I have a series/column of my dataframe that starts with 'False' in all positions. Then it starts receiving value with time, one per time. The problem then starts when I have to add more than one value to the same cell. E.g.
df = pd.DataFrame(data=[[1, 2, False], [4, 5, False], [7, 8, False]],columns=["A","B","C"])
which gives me:
- A B C
0 1 2 False
1 4 5 False
2 7 8 False
I've tried to transform the cell into a list in different ways, e.g (just a few as examples):
df.iloc[0,0] = df.iloc[0,0].tolist().append("A")
OR -
df.iloc[0,0] = df.iloc[0,0].tolist()
df.iloc[0,0] = df.iloc[0,0].append("A")
But nothing worked so far.
Any way I can append a value (a string) to a specific cell, a cell that might start as a Boolean or as a String?
If it's needed to concat value of a cell with a string value, you can use:
df.iloc[1,0] = str(df.iloc[1,0]) + "A"
df.iloc[0,2] = str(df.iloc[0,2]) + "A"
Or f-string can be used:
df.iloc[1,0] = f'{df.iloc[1,0]}' + "A"
df.iloc[0,2] = f'{df.iloc[0,2]}' + "A"
It is generally not advisable (check this article for example) to have Pandas dataframes with mixed dtypes since you cannot guarantee the behaviour of each "cell".
Therefore, one solution would be to first ensure that the whole column that you might change in the future is of type list. For example, if you know that the column "C" will or might be updated in the future to append values to it as if it's a list, then it's preferable that the False values you mentioned as a "starting point" are already encoded as part of a list. For example, with the dataframe you provided:
df.loc[:,"C"] = df.loc[:,"C"].apply(lambda x: [x])
df.iloc[0, 2].append("A")
df
This outputs:
A B C
0 1 2 [False, A]
1 4 5 [False]
2 7 8 [False]
And now, if you want to go through the C and check if the first value is False or True, you could, for example, iterate over:
df["C"].apply(lambda x: x[0])
This ensures that you can still access this value without resorting to tricks like checking the type, etc.

Aggregate function (small) returns zeros rather than the smallest values

I am using excel's aggregate (small) function to find the smallest value for each name that appears in a column. The issue is that the formula below simply returns 0s everywhere there is a value in B.
The formula I am using is
=IF($B2<>"", AGGREGATE(15,7, ($B:$B)*($A2=$A:$A)*($B2<>""), 1), "")
where B contains the data I want the smallest value from and A contains identifying strings.
I appreciate any help you can lend!
You want to divide by the criteria:
=IF($B2<>"", AGGREGATE(15,7, ($B:$B)/(($A2=$A:$A)*($B:$B<>"")), 1), "")
Whenever ($A2=$A:$A) or ($B2<>"") is FALSE it will return 0 and anything multiplied by 0 is 0 and as such the smallest value is 0.
By dividing by the criteria we throw an #DIV/0 error which the 7 in the AGGREGATE's second criterion forces the formula to ignore and as such we only get the smallest of what returns TRUE or 1 in both Boolean. 1*1=1.
But one more thing. AGGREGATE is an array type formula so limiting the to only the data will speed it up.
=IF($B2<>"", AGGREGATE(15,7, ($B$1:INDEX($B:$B,MATCH("zzz",$A:$A)))/(($A2=$A$1:INDEX($A:$A,MATCH("zzz",$A:$A)))*($B$1:INDEX($B:$B,MATCH("zzz",$A:$A))<>"")), 1), "")
As per your comment:
=IF($B2 = AGGREGATE(15,7, ($B:$B)/(($A2=$A:$A)*($B:$B<>"")), 1),AGGREGATE(15,7, ($B:$B)/(($A2=$A:$A)*($B:$B<>"")), 1), "")

Counting the occurence of substrings in matlab

I have a cell, something like this P= {Face1 Face6 Scene6 Both9 Face9 Scene11 Both12 Face15}. I would like to count how many Face values, Scene values, Both values in P. I don't care about the numeric values after the string (i.e., Face1 and Face23 would be counted as two). I've tried the following (for the Face) but I got the error "If any of the input arguments are cell arrays, the first must be a cell array of strings and the second must be a character array".
strToSearch='Face';
numel(strfind(P,strToSearch));
Does anyone have any suggestion? Thank you!
Use regexp to find strings that start (^) with the desired text (such as 'Face'). The result will be a cell array, where each cell contains 1 if there is a match, or [] otherwise. So determine if each cell is nonempty (~cellfun('isempty', ...): will give a logical 1 for nonempty cells, and 0 for empty cells), and sum the results (sum):
>> P = {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
>> sum(~cellfun('isempty', regexp(P, '^Face')))
ans =
4
>> sum(~cellfun('isempty', regexp(P, '^Scene')))
ans =
2
Your example should work with some small tweaks, provided all of P contains strings, but may give the error you get if there are any non-string values in the cell array.
P= {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
strToSearch='Face';
n = strfind(P,strToSearch);
numel([n{:}])
(returns 4)

Union of cell array of cells

I'm looking for the way to do the union of two cell arrays of cell arrays of strings. For example:
A = {{'one' 'two'};{'three' 'four'};{'five' 'six'}};
B = {{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
And I'd like to get something like:
C = {{'one' 'two'};{'three' 'four'};{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
But when I use C = union(A, B) MATLAB returns an error saying:
Input A of class cell and input B of class cell must be cell arrays of strings, unless one is a string.
Does anyone know how to do something like this in a hopefully simple way? I'd greatly appreciate it.
ALTERNATIVE: A way to have a cell array of separated strings in any other way than a cell array of cell array of strings would be also useful, but as far as I know, it's not possible.
Thank you!
C=[A;B]
allWords=unique([A{:};B{:}])
F=cell2mat(cellfun(#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))',C,'uni',false))
[~,uniqueindices,~]=unique(F,'rows')
C(sort(uniqueindices))
What my code does: it builds up a list of all words allwords, then this list is used to build up a matrix which contains the correlation between the rows and which word they contain. 1=Match for first wird, 2=Match for second word. Finally, on this numeric matrix unique can be applied to get the indices.
Including my update, now the 2 words per cell is hardcoded. To get rid of this limitation it would be neseccary to replace the anonymous function (#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))) with a more generic implementation. Probably using cellfun again.
Union doesn't seem like compatible for cell arrays of cells. So, we need to look for some workaround.
One approach would be to get the data from A and B concatenated vertically. Then, along each column assign each cell of strings an unique ID. Those IDs can then be combined into a double array that opens up the possibility of of using unique with 'rows' option to get us the desired output. This is precisely achieved here.
%// Slightly complicated input for safest verification of results
A = {{'three' 'four'};
{'five' 'six'};
{'five' 'seven'};
{'one' 'two'}};
B = {{'seven' 'eight'};
{'five' 'six'};
{'nine' 'ten'};
{'three' 'six'};};
t1 = [A ; B] %// concatenate all cells from A and B vertically
t2 = vertcat(t1{:}) %// Get all the cells of strings from A and B
t22 = mat2cell(t2,size(t2,1),ones(1,size(t2,2)));
[~,~,row_ind] = cellfun(#(x) unique(x,'stable'),t22,'uni',0)
mat1 = horzcat(row_ind{:})
[~,ind] = unique(mat1,'rows','stable')
out1 = t2(ind,:) %// output as a cell array of strings, used for verification too
out = mat2cell(out1, ones(1,size(out1,1)),size(out1,2)) %//desired output
Output -
out1 =
'three' 'four'
'five' 'six'
'five' 'seven'
'one' 'two'
'seven' 'eight'
'nine' 'ten'
'three' 'six'

Resources