MDX NON EMPTY multidimension axis - excel

I'm writting MDX code in VBA and I have to put the data in Excel.
SELECT
{ [Measures].[VL PROD], [Measures].[Impostos], [Measures].[RecLiqProd], [Measures.[ValorMgProd], [Measures].[QTD ITENS] ,[Measures].[VL FRETE] } ON 0,
NON EMPTY ( { Descendants( [Produto].[Produto].[Departamento], 5 ) } ) ON 1,
NON EMPTY ( { [Data Pedido].[Data].[Ano].&[2014].&[2].&[6].&[1]:[Data Pedido].[Data].[Ano].&[2014].&[2].&[6].&[26] } ON 2,
NON EMPTY ( { [Unidade Negócio].[Unidade Negócio].&[Unidade 1], [Unidade Negócio].[Unidade Negócio].&[Unidade 2], [Unidade Negócio].[Unidade Negócio].&[Unidade 3] } ) ON 3
FROM [Rentabilidade]
WHERE ( - Extract( { [Livre de Debito] }, [Meio Pagamento].[Meio Pagamento]) )
For i = 0 To cst.Axes(1).Positions.Count - 1
For j = 0 To cst.Axes(2).Positions.Count - 1
For k = 0 To cst.Axes(3).Positions.Count - 1
'If cst(0, i, j, k) * cst(1, i, j, k) * cst(2, i, j, k) * cst(3, i, j, k) * cst(4, i, j, k) * cst(5, i, j, k) <> "" Then
Cells(a, 1) = cst.Axes(1).Positions(i).Members(0).Caption
Cells(a, 2) = cst.Axes(2).Positions(j).Members(0).Caption
Cells(a, 3) = cst.Axes(3).Positions(k).Members(0).Caption
Cells(a, 4) = cst(0, i, j, k)
Cells(a, 5) = cst(1, i, j, k)
Cells(a, 6) = cst(2, i, j, k)
Cells(a, 7) = cst(3, i, j, k)
Cells(a, 8) = cst(4, i, j, k)
Cells(a, 9) = cst(5, i, j, k)
a = a + 1
'End If
Next k
Next j
Next i
The problem is that I get plenty of empty rows; I'd like to know how I can remove them.
For example, I'm getting the following:
Id | Data | Bandeira | impostos | recliq | ValorMrg | Qtd Item | Vl Frete
10 | 40230 | Unidade 1 | | | | |
10 | 40230 | Unidade 2 | | | | |
10 | 40230 | Unidade 3 | 0,2 | 2032 | 100 | 1000 | 323
32 | 40231 | Unidade 3 | | | | |
32 | 40232 | Unidade 3 | | | | |
32 | 40233 | Unidade 3 | 0,2 | 32 | 321 | 5045 | 323
I thought I had understood the difference between non empty and nonempty (from http://beyondrelational.com/modules/2/blogs/65/posts/11569/mdx-non-empty-vs-nonempty.aspx) but maybe I'm missing something.
Can anyone help me?

If you want to have a two dimensional report, why do you run a four dimensional query?
I would think that the following MDX
SELECT
{ [Measures].[VL PROD], [Measures].[Impostos], [Measures].[RecLiqProd], [Measures.[ValorMgProd], [Measures].[QTD ITENS] ,[Measures].[VL FRETE] }
ON 0,
NON EMPTY
{ Descendants( [Produto].[Produto].[Departamento], 5 ) } )
*
{ [Data Pedido].[Data].[Ano].&[2014].&[2].&[6].&[1]:[Data Pedido].[Data].[Ano].&[2014].&[2].&[6].&[26] }
*
{ [Unidade Negócio].[Unidade Negócio].&[Unidade 1], [Unidade Negócio].[Unidade Negócio].&[Unidade 2], [Unidade Negócio].[Unidade Negócio].&[Unidade 3] } )
ON 1
FROM [Rentabilidade]
WHERE ( - Extract( { [Livre de Debito] }, [Meio Pagamento].[Meio Pagamento]) )
would deliver what you want. In this case, the NON EMPTY on Axis 1 would be evaluated for each tuple of the cross product of the three hierarchies against the columns axis.
Of course, then you would have to change your VBA code accordingly, as you now have only one axis for the rows, but it has three positions instead of one.

Related

Azure Application Insights - How to display a row with default values if the kusto query returned no results?

I am using the below Kusto query in the Azure Application Insights workbook to get the count of satisfied users, tolerating users, and frustrated users.
let apdexThreshhold = toint(1000);
let apdexData = pageViews
| where timestamp > ago(7d)
| where name in ('*') or '*' in ('*')
| extend success = columnifexists('success', true)
| extend Failure = iff('ConsiderFailures' == 'ConsiderFailures' and success == false, 1, 0)
| extend InterestingDimension = iff(isempty(name) == true, 'Unknown', name)
| where InterestingDimension in ('*') or '*' in ('*')
| summarize AverageDuration = avg(duration), Failures = sum(Failure) by user_Id, InterestingDimension
| extend UserExperience = case(AverageDuration <= apdexThreshhold, 'Satisfied', AverageDuration <= 4 * apdexThreshhold, 'Tolerating', 'Frustrated')
| extend UserExperience = case(Failures > 0, "Frustrated", UserExperience)
| summarize
Satisfied = countif(UserExperience == 'Satisfied'),
Tolerating = countif(UserExperience == 'Tolerating'),
Frustrated = countif(UserExperience == 'Frustrated'),
Total = count()
by InterestingDimension
| project
InterestingDimension,
["Satisfied Users"] = Satisfied,
["Tolerating Users"] = Tolerating,
["Frustrated Users"] = Frustrated,
["Apdex Score"] = round((Satisfied + (Tolerating / 2.0)) / Total, 2),
Total
| extend Relevance = iff(["Apdex Score"] == 0, pow(Total, 1.6), Total / ["Apdex Score"])
| project-rename Users = Total
| order by Relevance desc
| project-away Users, Relevance;
apdexData
| extend ["Apdex Interpretation"] = case(["Apdex Score"] <= 0.5, '⛔ Unacceptable', ["Apdex Score"] <= 0.7, '⚠️ Poor', ["Apdex Score"] <= 0.85, '⚠️ Fair', ["Apdex Score"] <= 0.94, '✔️ Good', '✔️ Excellent')
| project
Values = InterestingDimension,
["Apdex Score"],
["Apdex Interpretation"],
["Satisfied Users"],
["Tolerating Users"],
["Frustrated Users"]
The above query returned the results without any issues. But whenever there is no data, then this query returns a text message that says, "no results". But I want to display a single row with the default value "0" in each column.
Example:
Updated:
let emptyRow = datatable( Values: string, ["Apdex Score"]: double, ["Apdex Interpretation"]: string, ["Satisfied Users"]:long, ["Tolerating Users"]: long, ["Frustrated Users"]: long) [ "0", 0, "0", 0, 0, 0] ;
<Above Query>
// add empty row
| union (emptyRow)
| order by ["Apdex Interpretation"] desc
The above query adds the empty row, even in the case of results. I tried to update the above query with the below lines of code to add the empty row in the event of no results only. but it is still not working as expected.
let T = apdexData
| where Values!=null
| project
Values = InterestingDimension,
["Apdex Score"],
["Apdex Interpretation"],
["Satisfied Users"],
["Tolerating Users"],
["Frustrated Users"];
let T_has_records = toscalar(T | summarize count() > 0);
union
(T | where T_has_records == true),
(emptyRow | where T_has_records == false)
You could do it in various ways, like unioning with an empty row:
let emptyRow = datatable( Values: string, ["Apdex Score"]: double, ["Apdex Interpretation"]: string, ["Satisfied Users"]:long, ["Tolerating Users"]: long, ["Frustrated Users"]: long) [ "0", 0, "0", 0, 0, 0] ;
...
your existing query above
...
// add empty row
| union (emptyRow)
| order by ["Apdex Interpretation"] desc
but that will ALWAYS add the empty row. you could then possibly use the scan operator (https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/scan-operator) to possibly filter out the empty group?
might be more work than just the "no results" message (you can customize the no results message in the advanced settings tab as well)
Edit: it looks like your edits created a bunch of things that just are invalid syntax. StackOverflow's goal isn't to do all your work for you...
but if i copy and paste your stuff from all above and just fix the syntax issues it seems like it works:
let emptyRow = datatable (
Values: string,
["Apdex Score"]: double,
["Apdex Interpretation"]: string,
["Satisfied Users"]: long,
["Tolerating Users"]: long,
["Frustrated Users"]: long
) [
"0", 0, "0", 0, 0, 0
];
let apdexThreshhold = toint(1000);
let apdexData = pageViews
| where timestamp > ago(7d)
| where name in ('*') or '*' in ('*')
| extend success = columnifexists('success', true)
| extend Failure = iff('ConsiderFailures' == 'ConsiderFailures' and success == false, 1, 0)
| extend InterestingDimension = iff(isempty(name) == true, 'Unknown', name)
| where InterestingDimension in ('*') or '*' in ('*')
| summarize AverageDuration = avg(duration), Failures = sum(Failure) by user_Id, InterestingDimension
| extend UserExperience = case(AverageDuration <= apdexThreshhold, 'Satisfied', AverageDuration <= 4 * apdexThreshhold, 'Tolerating', 'Frustrated')
| extend UserExperience = case(Failures > 0, "Frustrated", UserExperience)
| summarize
Satisfied = countif(UserExperience == 'Satisfied'),
Tolerating = countif(UserExperience == 'Tolerating'),
Frustrated = countif(UserExperience == 'Frustrated'),
Total = count()
by InterestingDimension
| project
InterestingDimension,
["Satisfied Users"] = Satisfied,
["Tolerating Users"] = Tolerating,
["Frustrated Users"] = Frustrated,
["Apdex Score"] = round((Satisfied + (Tolerating / 2.0)) / Total, 2),
Total
| extend Relevance = iff(["Apdex Score"] == 0, pow(Total, 1.6), Total / ["Apdex Score"])
| project-rename Users = Total
| order by Relevance desc
| project-away Users, Relevance;
let T = apdexData
| extend ["Apdex Interpretation"] = case(["Apdex Score"] <= 0.5, '⛔ Unacceptable', ["Apdex Score"] <= 0.7, '⚠️ Poor', ["Apdex Score"] <= 0.85, '⚠️ Fair', ["Apdex Score"] <= 0.94, '✔️ Good', '✔️ Excellent')
| project
Values = InterestingDimension,
["Apdex Score"],
["Apdex Interpretation"],
["Satisfied Users"],
["Tolerating Users"],
["Frustrated Users"]
| where isnotempty(Values);
let T_has_records = toscalar(T| summarize count() > 0);
union
(T
| where T_has_records == true),
(emptyRow
| where T_has_records == false)

Maze solver won't backtrack in Python

There are a couple of maze questions similar to this one but none of them ever really go into the why it won't work.
I don't need precise answers. I just need to know why this particular thing doesn't work.
This is the bit of my class Maze that I need help with.
ysize in my example is 10
xsize is 10
xend is 20 (changing it to 19 messes with the results and doesn't draw anything)
yend is 10 (changing it to 9 does this too)
class Maze:
def __init__(self):
self.maze = []
self.xstart = None
self.ystart = None
self.xend = None
self.yend = None
self.xsize = None
self.ysize = None
def read_maze(self, filename):
maze_list = []
f_maze = open(filename)
size = f_maze.readline().split() #
start = f_maze.readline().split() #
end = f_maze.readline().split() #
self.xstart = int(start[1])
self.ystart = int(start[0])
self.xend = (int(end[1])*2)
self.yend = (int(end[0])*2)
self.xsize = (int(size[1])*2)
self.ysize = (int(size[0])*2)
lines = f_maze.readlines()
for line in lines:
maze_list.append(list(line[:len(line)]))
self.maze = maze_list # Assigns to class
def __str__(self):
return ("".join(''.join(line) for line in self.maze))
def solve(self, x, y):
if y > (self.ysize) or x > (self.xsize):
print("1")
return False
if self.maze[y][x] == self.maze[self.yend][self.xend]:
print("2")
return True
if self.maze[y][x] != " ":
print("3")
return False
self.maze[y][x] = "o" # MARKING WITH o for path already taken.
if self.solve(x+1,y) == True:
return True
elif self.solve(x,y+1) == True:
return True
elif self.solve(x-1,y) == True:
return True
elif self.solve(x,y-1) == True:
return True
self.maze[y][x] = " " # ELSE I want it to be replaced with space
return False
This is the current result.
---------------------
|ooooooooooooo| | |
|-+-+-+ +-+-+o+ + +-|
| | | |o| |
| +-+-+ + +-+-+-+ + |
| | | | | |
|-+-+ + + + +-+ +-+-|
| | |
|-+ +-+-+-+-+-+ +-+ |
| | | |
---------------------
I want it like this:
---------------------
|ooooooo | | |
|-+-+-+o+-+-+ + + +-|
| | o| | | |
| +-+-+o+ +-+-+-+ + |
| o| | | | |
|-+-+ +o+ + +-+ +-+-|
| |ooooooooooooo|
|-+ +-+-+-+-+-+ +-+o|
| | | o|
---------------------
I don't know how to fix the indentation format here I apologize. That is my whole code. These are my test statements:
maze = Maze()
maze.read_maze(filename)
maze.solve(maze.xstart, maze.ystart)
print(maze)
The files go in this format saved as .txt files.
5 10
1 1
5 10
---------------------
| | | |
|-+-+-+ +-+-+ + + +-|
| | | | | |
| +-+-+ + +-+-+-+ + |
| | | | | |
|-+-+ + + + +-+ +-+-|
| | |
|-+ +-+-+-+-+-+ +-+ |
| | | |
---------------------
The problem is that as your file stands, xend, yend is (10, 20). To debug why it's not working, you can print(self.maze[self.yend][self.xend]) which returns a dash "-". Now, when your recursive call's (x, y) pair reaches its first dash, it tests True for the line
if self.maze[y][x] == self.maze[self.yend][self.xend]:
and thinks it has solved the maze. Rather, we want to test
if (y, x) == (self.yend, self.xend):
That is, test the coordinates, not the value of the square.
Another point: examining the actual location of the goal, we see that it's here:
+-+ +-+ |
| |
--------- <= this corner is the goal!
Which is unreachable if moving in strictly cardinal directions. Moving the goal a square up or to the left would put it within bounds of the solver algorithm.
This was sufficient to get the code working for me and hopefully is enough to get you moving again.

Array partition using dynamic programming

What modification should I apply to the dynamic programming implementation of two partition problem to solve the following task:
You are given an array of positive integers as input, denote it C. The program should decide if it is possible to partition the array into two equal sum subsequences. You are allowed to remove some elements from the array, but not all, in order to make such partition feasible.
Example:
Suppose the input is 4 5 11 17 9. Two partition is possible if we remove 11 and 17. My question is what adjustments to my two partition implementation I should make to determine if two partition is possible (may or may not require to remove some elements) or output that two partition is impossible even if some elements are removed. The program should run in O(sum^2 * C) time.
Here is my two partition implementation in Python:
def two_partition(C):
n = len(C)
s = sum(C)
if s % 2 != 0: return False
T = [[False for _ in range(n + 1)] for _ in range(s//2 + 1)]
for i in range(n + 1): T[0][i] = True
for i in range(1, s//2 + 1):
for j in range(1, n + 1):
T[i][j] = T[i][j-1]
if i >= C[j-1]:
T[i][j] = T[i][j] or T[i-C[j-1]][j-1]
return T[s // 2][n]
For example, with input [2, 3, 1] the expected output is {2,1} and {3}. This makes it is possible to partition the array into two equal subsets. We don't need to remove any elements in this case. In the above example of 4 5 11 17 9, the two subsets are possible if we remove 11 and 17. This leaves {4,5} and {9}.
Create a 3 dimensional array indexed by sum of 1st partition, sum of 2nd partition and number of elements.
T[i][j][k] if only true if it's possible to have two disjoint subsets with sum i & j respectively within the first k elements.
To calculate it, you need to consider three possibilities for each element. Either it's present in first set, or second set, or it's removed entirely.
Doing this in a loop for each combination of sum possible generates the required array in O(sum ^ 2 * C).
To find the answer to your question, all you need to check is that there is some sum i such that T[i][i][n] is true. This implies that there are two distinct subsets both of which sum to i, as required by the question.
If you need to find the actual subsets, doing so is easy using a simple backtracking function. Just check which of the three possibilities are possible in the back_track functions and recurse.
Here's a sample implementation:
def back_track(T, C, s1, s2, i):
if s1 == 0 and s2 == 0: return [], []
if T[s1][s2][i-1]:
return back_track(T, C, s1, s2, i-1)
elif s1 >= C[i-1] and T[s1 - C[i-1]][s2][i-1]:
a, b = back_track(T, C, s1 - C[i-1], s2, i-1)
return ([C[i-1]] + a, b)
else:
a, b = back_track(T, C, s1, s2 - C[i-1], i-1)
return (a, [C[i-1]] + b)
def two_partition(C):
n = len(C)
s = sum(C)
T = [[[False for _ in range(n + 1)] for _ in range(s//2 + 1)] for _ in range(s // 2 + 1)]
for i in range(n + 1): T[0][0][i] = True
for s1 in range(0, s//2 + 1):
for s2 in range(0, s//2 + 1):
for j in range(1, n + 1):
T[s1][s2][j] = T[s1][s2][j-1]
if s1 >= C[j-1]:
T[s1][s2][j] = T[s1][s2][j] or T[s1-C[j-1]][s2][j-1]
if s2 >= C[j-1]:
T[s1][s2][j] = T[s1][s2][j] or T[s1][s2-C[j-1]][j-1]
for i in range(1, s//2 + 1):
if T[i][i][n]:
return back_track(T, C, i, i, n)
return False
print(two_partition([4, 5, 11, 9]))
print(two_partition([2, 3, 1]))
print(two_partition([2, 3, 7]))
To determine if it's possible, keep a set of unique differences between the two parts. For each element, iterate over the differences seen so far; subtract and add the element. We're looking for the difference 0.
4 5 11 17 9
0 (empty parts)
|0 ± 4| = 4
set now has 4 and empty-parts-0
|0 ± 5| = 5
|4 - 5| = 1
|4 + 5| = 9
set now has 4,5,1,9 and empty-parts-0
|0 ± 11| = 11
|4 - 11| = 7
|4 + 11| = 15
|5 - 11| = 6
|5 + 11| = 16
|1 - 11| = 10
|1 + 11| = 12
|9 - 11| = 2
|9 + 11| = 20
... (iteration with 17)
|0 ± 9| = 9
|4 - 9| = 5
|4 + 9| = 13
|5 - 9| = 4
|5 + 9| = 14
|1 - 9| = 8
|1 + 9| = 10
|9 - 9| = 0
Bingo!
Python code:
def f(C):
diffs = set()
for n in C:
new_diffs = [n]
for d in diffs:
if d - n == 0:
return True
new_diffs.extend([abs(d - n), abs(d + n)])
diffs = diffs.union(new_diffs)
return False
Output:
> f([2, 3, 7, 2])
=> True
> f([2, 3, 7])
=> False
> f([7, 1000007, 1000000])
=> True
I quickly adapted code for searching of three equal-sums subsets to given problem.
Algorithm tries to put every item A[idx] in the first bag, or in the second bag (both are real bags) or in the third (fake) bag (ignored items). Initial values (available space) in the real bags are half of overall sum. This approach as-is has exponential complexity (decision tree with 3^N leaves)
But there is a lot of repeating distributions, so we can remember some state and ignore branches with no chance, so a kind of DP - memoization is used. Here mentioned state is set of available space in real bags when we use items from the last index to idx inclusively.
Possible size of state storage might reach N * sum/2 * sum/2
Working Delphi code (is not thoroughly tested, seems has a bug with ignored items output)
function Solve2(A: TArray<Integer>): string;
var
Map: TDictionary<string, boolean>;
Lists: array of TStringList;
found: Boolean;
s2: integer;
function CheckSubsetsWithItem(Subs: TArray<Word>; idx: Int16): boolean;
var
key: string;
i: Integer;
begin
if (Subs[0] = Subs[1]) and (Subs[0] <> s2) then begin
found:= True;
Exit(True);
end;
if idx < 0 then
Exit(False);
//debug map contains current rests of sums in explicit representation
key := Format('%d_%d_%d', [subs[0], subs[1], idx]);
if Map.ContainsKey(key) then
//memoisation
Result := Map.Items[key]
else begin
Result := false;
//try to put A[idx] into the first, second bag or ignore it
for i := 0 to 2 do begin
if Subs[i] >= A[idx] then begin
Subs[i] := Subs[i] - A[idx];
Result := CheckSubsetsWithItem(Subs, idx - 1);
if Result then begin
//retrieve subsets themselves at recursion unwindning
if found then
Lists[i].Add(A[idx].ToString);
break;
end
else
//reset sums before the next try
Subs[i] := Subs[i] + A[idx];
end;
end;
//remember result - memoization
Map.add(key, Result);
end;
end;
var
n, sum: Integer;
Subs: TArray<Word>;
begin
n := Length(A);
sum := SumInt(A);
s2 := sum div 2;
found := False;
Map := TDictionary<string, boolean>.Create;
SetLength(Lists, 3);
Lists[0] := TStringList.Create;
Lists[1] := TStringList.Create;
Lists[2] := TStringList.Create;
if CheckSubsetsWithItem([s2, s2, sum], n - 1) then begin
Result := '[' + Lists[0].CommaText + '], ' +
'[' + Lists[1].CommaText + '], ' +
' ignored: [' + Lists[2].CommaText + ']';
end else
Result := 'No luck :(';
end;
begin
Memo1.Lines.Add(Solve2([1, 5, 4, 3, 2, 16,21,44, 19]));
Memo1.Lines.Add(Solve2([1, 3, 9, 27, 81, 243, 729, 6561]));
end;
[16,21,19], [1,5,4,2,44], ignored: [3]
No luck :(

Python - scope variables returning none

I'm trying to build a grid based of a function i'm creating. I have assigned inner variables one for the
grid1(number):
D = ('-' * number)
S = (' ' * number)
H = print('+'+ D +'+' + D + '+')
V = print('|'+ S + '|' + S + '|' '\n')
print( H '\n' (V * number) + H '\n'+ (V * number) + H)
Basically im trying to create a grid that is 2 x 2, when it prints the V variable a second time, or even the next H, the variable returns as None. To my knowledge, I did not create an iterator, therefore the variable should still be assigned.
Am i missing something with scope variables ?
I assume you wanted to assign the strings to H and V, then use those again in the next print statement. In that case you need to remove the first two print statements and only do the assignment to the variables. With some other fixes, this results in:
def grid1(number):
D = ('-' * number)
S = (' ' * number)
H = '+'+ D +'+' + D + '+'
V = '|'+ S + '|' + S + '|' '\n'
print( H + '\n' + (V * number) + H + '\n' + (V * number) + H)
Which leads to for example:
>>> grid1(3)
+---+---+
| | |
| | |
| | |
+---+---+
| | |
| | |
| | |
+---+---+

Pythonic way to generate random uniformly distributed points within hollow square lamina

Suppose we have a hollow square lamina of size n. That is, we have a nxn square from which a k*l rectangle has been removed (1<=k,l<=n-2). I want to calculate an average of distances between 2 random, uniformly distributed points within such hollow square lamina.
For the sake of simplicity let's consider n=3, k=l=1, or a 3x3 square from whose center a unit square has been removed
I wrote this code for numpy, but it has at least 2 problems: I have to throw away approximately 1/9 of all generated points and removing the numpy.array elements requires lots of RAM:
x,y = 3*np.random.random((2,size,2))
x = x[
np.logical_not(np.logical_and(
np.logical_and(x[:,0] > 1, x[:,0] < 2),
np.logical_and(x[:,1] > 1, x[:,1] < 2)
))
]
y = y[
np.logical_not(np.logical_and(
np.logical_and(y[:,0] > 1, y[:,0] < 2),
np.logical_and(y[:,1] > 1, y[:,1] < 2)
))
]
n = min(x.shape[0], y.shape[0])
UPD:Here size is the sample size of which I'm going to calculate average.
Is there an elegant way to generate those points right away, without removing the unfit ones?
UPD: Here is the full code just for the reference:
def calc_avg_dist(size):
x,y = 3*np.random.random((2,size,2))
x = x[
np.logical_not(np.logical_and(
np.logical_and(x[:,0] > 1, x[:,0] < 2),
np.logical_and(x[:,1] > 1, x[:,1] < 2)
))
]
y = y[
np.logical_not(np.logical_and(
np.logical_and(y[:,0] > 1, y[:,0] < 2),
np.logical_and(y[:,1] > 1, y[:,1] < 2)
))
]
n = min(x.shape[0], y.shape[0])
diffs = x[:n,:] - y[:n,:]
return np.sum(np.sqrt(np.einsum('ij,ij->i',diffs,diffs)))/n
With the center removed, there are 8 regions that should contain points. These are their lower-left corners:
In [350]: llcorners = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [2, 1], [0, 2], [1, 2], [2, 2]])
The regions are 1x1, so they have the same area and are equally likely to contain a given random point. The following chooses size lower-left corners:
In [351]: corner_indices = np.random.choice(len(llcorners), size=size)
Now generate size (x,y) coordinates in the unit square:
In [352]: unit_coords = np.random.random(size=(size, 2))
Add those to the lower-left corners chosen previously:
In [353]: pts = unit_coords + llcorners[corner_indices]
pts has shape (size, 2). Here's a plot, with size = 2000:
In [363]: plot(pts[:,0], pts[:,1], 'o')
Out[363]: [<matplotlib.lines.Line2D at 0x11000f950>]
Update to address the updated question...
The following function generalizes the above idea to a rectangular shape containing a rectangular hollow. The rectangle is still considered to be nine regions, with the middle region being the hollow. The probability of a random point being in a region is determined by the area of the region; numpy.random.multinomial is used to select the number of points in each region.
(I'm sure there is room for optimization of this code.)
from __future__ import division
import numpy as np
def sample_hollow_lamina(size, outer_width, outer_height, a, b, inner_width, inner_height):
"""
(a, b) is the lower-left corner of the "hollow".
"""
llcorners = np.array([[0, 0], [a, 0], [a+inner_width, 0],
[0, b], [a+inner_width, b],
[0, b+inner_height], [a, b+inner_height], [a+inner_width, b+inner_height]])
top_height = outer_height - (b + inner_height)
right_width = outer_width - (a + inner_width)
widths = np.array([a, inner_width, right_width, a, right_width, a, inner_width, right_width])
heights = np.array([b, b, b, inner_height, inner_height, top_height, top_height, top_height])
areas = widths * heights
shapes = np.column_stack((widths, heights))
regions = np.random.multinomial(size, areas/areas.sum())
indices = np.repeat(range(8), regions)
unit_coords = np.random.random(size=(size, 2))
pts = unit_coords * shapes[indices] + llcorners[indices]
return pts
For example,
In [455]: pts = sample_hollow_lamina(2000, 5, 5, 1, 1, 2, 3)
In [456]: plot(pts[:,0], pts[:,1], 'o', alpha=0.75)
Out[456]: [<matplotlib.lines.Line2D at 0x116da0a50>]
In [457]: grid()
Note that the arguments do not have to be integers:
In [465]: pts = sample_hollow_lamina(2000, 3, 3, 0.5, 1.0, 1.5, 0.5)
In [466]: plot(pts[:,0], pts[:,1], 'o', alpha=0.75)
Out[466]: [<matplotlib.lines.Line2D at 0x116e60390>]
In [467]: grid()
I've already posted a shorter and somehow unclear answer,
here I've taken the time to produce what I think is a better answer.
Generalizing the OP problem, we have a "surface" composed of nsc * nsr
squares disposed in nsc columns and nsr rows, and a "hole" composed
of nhc * nhr squares (corresponding to the surface squares) disposed
in a rectangular arrangement with nhr rows and nhc columns, with
the origin placed at ohc, ohr
+------+------+------+------+------+------+------+------+------+ nsr
| | | | | | | | | |
| | | | | | | | | |
...+------+------+------+------+------+------+------+------+------+ nsr-1
| | | oooo | oooo | | | | | |
| | | oooo | oooo | | | | | |
+------+------+------+------+------+------+------+------+------+ ...
| | | oooo | oooo | | | | | |
| | | oooo | oooo | | | | | |
+------+------+------+------+------+------+------+------+------+ ...
| | | oooo | oooo | | | | | |
| | | oooo | oooo | | | | | |
ohc+------+------+------+------+------+------+------+------+------+ 1
| | | | | | | | | |
| | | | | | | | | |
+------+------+------+------+------+------+------+------+------+ 0
0 1 2 ... ... nsc-1 nsc
| |
ohr=2 ohr+nhr
Our aim is to draw n random points from the surface without the hole
(the admissible surface) with a uniform distribution over the
admissible surface.
We observe that the admissible area is composed of
nsq = nsc*nsr -nhc*nhr equal squares:
if we place a random point in an abstract unit square and then assign, with equal probability, that
point to one of the squares of the admissible area, we have done our
job.
In pseudocode, if random() samples from an uniformly distributed random variable over [0, 1)
(x, y) = (random(), random())
square = integer(random()*nsq)
(x, y) = (x, y) + (column_of_square_origin(square), row_of_square_origin(square))
To speed up the process, we are using numpy and we try to avoid, as far as possible,
explicit for loops.
With the names previously used, we need a listing of the origins of
the admissible squares, to implement the last line of the pseudo code.
def origins_of_OK_squares(nsc, nsr, ohc, ohr, nhc, nhr):
# a set of tuples, each one the origin of a square, for ALL the squares
s_all = {(x, y) for x in range(nsc) for y in range(nsr)}
# a set of tuples with the origin of the hole squares
s_hole = {(x, y) for x in range(ohc,ohc+nhc) for y in range(ohr,ohr+nhr)}
# the set of the origins of admissible squares is the difference
s_adm = s_all - s_hole
# return an array with all the origins --- the order is not important!
# np.array doesn't like sets
return np.array(list(s_adm))
We need to generate n random points in the unit square, organized in an array of shape (n,2)
rand_points = np.random.random((n, 2))
We need an array of n admissible squares
placements = np.random.randint(0, nsq, n)
We translate each point in rand_points in one of the admissible
squares, as specified by the elements of placements.
rand_points += origins_of_OK_squares(nsc, nsr, ohc, ohr, nhc, nhr)[placements]
taking advantage of the extended addressing that it is possible for
numpy arrays, and its done...
In an overly compact function
import numpy as np
def samples_wo_hole(n, nsc, nsr, ohc, ohr, nhc, nhr):
s_all = {(x, y) for x in range(nsc) for y in range(nsr)}
s_hole = {(x, y) for x in range(ohc,ohc+nhc) for y in range(ohr,ohr+nhr)}
rand_points = np.random.random((n, 2))
placements = np.random.randint(0, nsc*nsr - nhc*nhr, n)
return rand_points+np.array(list(s_all-s_hole))[placements]
You have 8 equal unit squares in which it is admissible to place points, so draw as many points in an unit square as you want as in
x = np.random(n, 2)
now it suffices to choose at random in which of the 8 admissible squares each point is placed
sq = np.random.randomint(0, 8, n)
you need also an array of origins
delta = np.array([[0, 0],
[1, 0],
[2, 0],
[0, 1],
# no central square
[2, 1],
[0, 2],
[1, 2]
[2, 2]])
and finally
x = x + delta[sq]
To generalize the solution, write a function to compute an array of the origins of the admissible squares, a possible implementation using sets being
def origins(n, hole_xor, hole_wd, hole_yor, hole_hg):
all_origins = {(x,y) for x in range(n) for y in range(n)}
hole_origins = {(x,y) for x in range(hole_xor, hole_xor+hole_wd)
for y in range(hole_yor, hole_yor+hole_hg)}
return np.array(list(all_origins-hole_origins)))
and use it like this
delta = origins(12, 4,5, 6,2)
n_squares = len(delta) # or n*n - width*height
target_square = np.random.randomint(0, n_squares, size)
x = x + delta[target_square]

Resources