What does this square bracket and parenthesis bracket notation mean [first1,last1)? - mathematical-notation

I have seen number ranges represented as [first1,last1) and [first2,last2).
I would like to know what such a notation means.

A bracket - [ or ] - means that end of the range is inclusive -- it includes the element listed. A parenthesis - ( or ) - means that end is exclusive and doesn't contain the listed element. So for [first1, last1), the range starts with first1 (and includes it), but ends just before last1.
Assuming integers:
(0, 5) = 1, 2, 3, 4
(0, 5] = 1, 2, 3, 4, 5
[0, 5) = 0, 1, 2, 3, 4
[0, 5] = 0, 1, 2, 3, 4, 5

That's a half-open interval.
A closed interval [a,b] includes the end points.
An open interval (a,b) excludes them.
In your case the end-point at the start of the interval is included, but the end is excluded. So it means the interval "first1 <= x < last1".
Half-open intervals are useful in programming because they correspond to the common idiom for looping:
for (int i = 0; i < n; ++i) { ... }
Here i is in the range [0, n).

The concept of interval notation comes up in both Mathematics and Computer Science. The Mathematical notation [, ], (, ) denotes the domain (or range) of an interval.
The brackets [ and ] means:
The number is included,
This side of the interval is closed,
The parenthesis ( and ) means:
The number is excluded,
This side of the interval is open.
An interval with mixed states is called "half-open".
For example, the range of consecutive integers from 1 .. 10 (inclusive) would be notated as such:
[1,10]
Notice how the word inclusive was used. If we want to exclude the end point but "cover" the same range we need to move the end-point:
[1,11)
For both left and right edges of the interval there are actually 4 permutations:
(1,10) = 2,3,4,5,6,7,8,9 Set has 8 elements
(1,10] = 2,3,4,5,6,7,8,9,10 Set has 9 elements
[1,10) = 1,2,3,4,5,6,7,8,9 Set has 9 elements
[1,10] = 1,2,3,4,5,6,7,8,9,10 Set has 10 elements
How does this relate to Mathematics and Computer Science?
Array indexes tend to use a different offset depending on which field are you in:
Mathematics tends to be one-based.
Certain programming languages tends to be zero-based, such as C, C++, Javascript, Python, while other languages such as Mathematica, Fortran, Pascal are one-based.
These differences can lead to subtle fence post errors, aka, off-by-one bugs when implementing Mathematical algorithms such as for-loops.
Integers
If we have a set or array, say of the first few primes [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ], Mathematicians would refer to the first element as the 1st absolute element. i.e. Using subscript notation to denote the index:
a1 = 2
a2 = 3
:
a10 = 29
Some programming languages, in contradistinction, would refer to the first element as the zero'th relative element.
a[0] = 2
a[1] = 3
:
a[9] = 29
Since the array indexes are in the range [0,N-1] then for clarity purposes it would be "nice" to keep the same numerical value for the range 0 .. N instead of adding textual noise such as a -1 bias.
For example, in C or JavaScript, to iterate over an array of N elements a programmer would write the common idiom of i = 0, i < N with the interval [0,N) instead of the slightly more verbose [0,N-1]:
function main() {
var output = "";
var a = [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ];
for( var i = 0; i < 10; i++ ) // [0,10)
output += "[" + i + "]: " + a[i] + "\n";
if (typeof window === 'undefined') // Node command line
console.log( output )
else
document.getElementById('output1').innerHTML = output;
}
<html>
<body onload="main();">
<pre id="output1"></pre>
</body>
</html>
Mathematicians, since they start counting at 1, would instead use the i = 1, i <= N nomenclature but now we need to correct the array offset in a zero-based language.
e.g.
function main() {
var output = "";
var a = [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ];
for( var i = 1; i <= 10; i++ ) // [1,10]
output += "[" + i + "]: " + a[i-1] + "\n";
if (typeof window === 'undefined') // Node command line
console.log( output )
else
document.getElementById( "output2" ).innerHTML = output;
}
<html>
<body onload="main()";>
<pre id="output2"></pre>
</body>
</html>
Aside:
In programming languages that are 0-based you might need a kludge of a dummy zero'th element to use a Mathematical 1-based algorithm. e.g. Python Index Start
Floating-Point
Interval notation is also important for floating-point numbers to avoid subtle bugs.
When dealing with floating-point numbers especially in Computer Graphics (color conversion, computational geometry, animation easing/blending, etc.) often times normalized numbers are used. That is, numbers between 0.0 and 1.0.
It is important to know the edge cases if the endpoints are inclusive or exclusive:
(0,1) = 1e-M .. 0.999...
(0,1] = 1e-M .. 1.0
[0,1) = 0.0 .. 0.999...
[0,1] = 0.0 .. 1.0
Where M is some machine epsilon. This is why you might sometimes see const float EPSILON = 1e-# idiom in C code (such as 1e-6) for a 32-bit floating point number. This SO question Does EPSILON guarantee anything? has some preliminary details. For a more comprehensive answer see FLT_EPSILON and David Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic
Some implementations of a random number generator, random() may produce values in the range 0.0 .. 0.999... instead of the more convenient 0.0 .. 1.0. Proper comments in the code will document this as [0.0,1.0) or [0.0,1.0] so there is no ambiguity as to the usage.
Example:
You want to generate random() colors. You convert three floating-point values to unsigned 8-bit values to generate a 24-bit pixel with red, green, and blue channels respectively. Depending on the interval output by random() you may end up with near-white (254,254,254) or white (255,255,255).
+--------+-----+
|random()|Byte |
|--------|-----|
|0.999...| 254 | <-- error introduced
|1.0 | 255 |
+--------+-----+
For more details about floating-point precision and robustness with intervals see Christer Ericson's Real-Time Collision Detection, Chapter 11 Numerical Robustness, Section 11.3 Robust Floating-Point Usage.

It can be a mathematical convention in the definition of an interval where square brackets mean "extremal inclusive" and round brackets "extremal exclusive".

Related

Octave fplot abs looks very strange

f = #(x)(abs(x))
fplot(f, [-1, 1]
Freshly installed octave, with no configuration edited. It results in the following image, where it looks as if it is constant for a while around 0, looking more like a \_/ than a \/:
Why does it look so different from a usual plot of the absolute value near 0? How can this be fixed?
Since fplot is written in Octave it is relatively easy to read. Its location can be found using the which command. On my system this gives:
octave:1> which fplot
'fplot' is a function from the file /usr/share/octave/5.2.0/m/plot/draw/fplot.m
Examining fplot.m reveals that the function to be plotted, f(x), is evaluated at n equally spaced points between the given limits. The algorithm for determining n starts at line 192 and can be summarised as follows:
n is initially chosen to be 8 (unless specified differently by the user)
Construct a vector of arguments using a coarser grid of n/2 + 1 points:
x0 = linspace (limits(1), limits(2), n/2 + 1)'
(The linspace function will accept a non-integer value for the number of points, which it rounds down)
Calculate the corresponding values:
y0 = f(x0)
Construct a vector of arguments using a grid of n points:
x = linspace (limits(1), limits(2), n)'
Calculate the corresponding values:
y = f(x0)
Construct a vector of values corresponding to the members of x but calculated from x0 and y0 by linear interpolation using the function interp1():
yi = interp1 (x0, y0, x, "linear")
Calculate an error metric using the following formula:
err = 0.5 * max (abs ((yi - y) ./ (yi + y + eps))(:))
That is, err is proportional to the maximum difference between the calculated and linearly interpolated values.
If err is greater than tol (2e-3 unless specified by the user) then put n = 2*(n-1) and repeat. Otherwise plot(x,y).
Because abs(x) is essentially a pair of straight lines, if x0 contains zero then the linearly interpolated values will always exactly match their corresponding calculated values and err will be exactly zero, so the above algorithm will terminate at the end of the first iteration. If x doesn't contain zero then plot(x,y) will be called on a set of points that doesn't include the 'cusp' of the function and the strange behaviour will occur.
This will happen if the limits are equally spaced either side of zero and floor(n/2 + 1) is odd, which is the case for the default values (limits = [-5, 5], n = 8).
The behaviour can be avoided by choosing a combination of n and limits so that either of the following is the case:
a) the set of m = floor(n/2 + 1) equally spaced points doesn't include zero or
b) the set of n equally spaced points does include zero.
For example, limits equally spaced either side of zero and odd n will plot correctly . This will not work for n=5, though, because, strangely, if the user inputs n=5, fplot.m substitutes 8 for it (I'm not sure why it does this, I think it may be a mistake). So fplot(#abs, [-1, 1], 3) and fplot(#abs, [-1, 1], 7) will plot correctly but fplot(#abs, [-1, 1], 5) won't.
(n/2 + 1) is odd, and therefore x0 contains zero for symmetrical limits, only for every 2nd even n. This is why it plots correctly with n=6 because for that value n/2 + 1 = 4, so x0 doesn't contain zero. This is also the case for n=10, 14, 18 and so on.
Choosing slightly asymmetrical limits will also do the trick, try: fplot(#abs, [-1.1, 1.2])
The documentation says: "fplot works best with continuous functions. Functions with discontinuities are unlikely to plot well. This restriction may be removed in the future." so it is probably a bug/feature of the function itself that can't be fixed except by the developers. The ordinary plot() function works fine:
x = [-1 0 1];
y = abs(x);
plot(x, y);
The weird shape comes from the sampling rate, i.e. at how many points the function is evaluated. This is controlled by the parameter N of fplot The default call seems to accidentally skip x=0, and with fplot(#abs, [-1, 1], N=5) I get the same funny shape like you:
However, trying out different values of N can yield the correct shape, try e.g. fplot(#abs, [-1, 1], N=6):
Although in general I would suggest to use way higher numbers, like N=100.

Can anybody explain me print statement in this code?

I found this code on internet but I am not able to understand how the print statement is working.
I have already tried to see many answers but none answers it perfectly.
def main():
n=int(raw_input())
for i in range(0, 1<<n):
gray=i^(i>>1)
print "{0:0{1}b}".format(gray,n),
main()
for i in range(0, 1<<n):
Here, 1 << n shifts 1 by n bits to left. It means:
if n = 1, 1 << 1 would be 10,
n = 2, 1 << 10 would be 100 [2 = binary 10]
and so on.
For decimal number the answer is equivalent 2 to the power n.
For binary 'n' number of zeros are added.
So the range is for i in range(0, 2 ** n).
gray=i^(i>>1)
Here i>>1 shifts i by 1 bit to right. It means:
if i = 1, 1 >> 1 would be 0,
i = 2, 10 >> 1 would be 1 [2 = binary 10]
i = 3, 100 >> 1 would be 10 (in binary) 2 in decimal
and so on.
For decimal numbers it is equivalent to dividing by 2 (and ignoring digits after . decimal point).
For binary last digit is erased.
^ is exclusive OR operator. It is defined as:
0 ^ 0 = 0,
0 ^ 1 = 1 ^ 0 = 1,
1 ^ 1 = 0
print "{0:0{1}b}".format(gray,n)
Here {1} refers to n, b refers to binary. So gray is converted to binary and expressed in n digits.
What you are looking at is known by the concept of Advanced string formatting. Specifically, PEP 3101 Advanced string Formatting
You may refer the official documentation for understanding purposes.

Is there an algorithm that can be parallelized for the "unique" problem?

We have a long (about 100,000) two-dimension numpy array.
Like:
A_in =
[[1, 2, 3, 4, 3, 2, 1, …, 100000],
[2, 3, 3, 5, 4, 3, 1, …, 100000]] (edge_index_cpu in code)
You can treat one column as one group here. Every number means a point, one column means the line between these two points.
We need get output, like:
A_out =
(new_edge_indices in code)
and index of these output values in the original array, like:
Idx_out =
[0, 2, 3]
The output group cannot any intersection with all the previous groups. In addition, if the previous group has been removed (like [[2],[3]] above), then the removed group will not be used to calculate the intersection (thus, [[3], [3]] is kept).
It can be easily implemented with a for loop. But because the data is too large for ‘for loop’, we would like to ask for an algorithm that can be parallelized for this problem.
I have tried to use numpy's unique operator from a flatten version of A_in
([1, 2, 2, 3, 3, 3, 4, 5, 3, 4, 2, 3, 1, 1, …]). But it cannot meet this “if the previous group has been removed (like [[2],[3]] above), then the removed group will not be used to calculate the intersection (thus, [[3], [3]] is kept)”.
We want to handle a graph containing edges and points.
edge_index_cpu = edge_index.cpu()
for edge_idx in edge_argsort.tolist():
source = edge_index_cpu[0, edge_idx].item()
if source not in nodes_remaining:
continue
target = edge_index_cpu[1, edge_idx].item()
if target not in nodes_remaining:
continue
new_edge_indices.append(edge_idx)
cluster[source] = i
nodes_remaining.remove(source)
if source != target:
cluster[target] = i
nodes_remaining.remove(target)
i += 1
# The remaining nodes are simply kept.
for node_idx in nodes_remaining:
cluster[node_idx] = i
i += 1
cluster = cluster.to(x.device)
I would not parallelize just yet as your problem can be solved in O(n)which should be fast enough.
definitions
lets consider we got this:
const int pnts=1000000; // max points
const int lins=1000000; // number of lines
int lin[2][lins]; // lines
bool his[pnts]; // histogram of points (used edge?)
int out[pnts],outs=0; // result out[outs]
I am C++/GL oriented so I use indexes starting from zero !!! I used static arrays not to confuse with dynamic allocation or list templates so its easy to understand.
histogram
create histogram for the points used. Its simply a table holding one counter or value per each possible point index. At start clear it. As we do not need to know how many times point is used I chose bool so its just true/false value that tells us if point is already used or not.
so clear this table at start with false:
for (i=0;i<pnts;i++) his[i]=0;
process lines data
simply process all points/lines in their order and update histogram for each point. So take a points index p0/p1 from lin[0/1][i] and test if the both point are already used:
p0=lin[0][i];
p1=lin[1][i];
if ((!his[p0])&&(!his[p1])){ his[p0]=true; his[p1]=true; add i to result }
if they are not add i to the result and set p0,p1 as used in histogram. As you can see this is O(1) I assume you where using for loop linear search for now making your version O(n^2).
Here small O(n) C++ example for this (sorry not a python coder):
void compute()
{
const int pnts=1000000; // max points
const int lins=1000000; // number of lines
int lin[2][lins]; // lines
bool his[pnts]; // histogram of points (used edge?)
int out[pnts],outs=0; // result out[outs]
int i,p0,p1;
// generate data
Randomize();
for (i=0;i<lins;i++)
{
lin[0][i]=Random(pnts);
lin[1][i]=Random(pnts);
}
// clear histogram
for (i=0;i<pnts;i++) his[i]=0;
// compute result O(lins)
for (i=0;i<lins;i++) // process all lines
{
p0=lin[0][i]; // first point of line
p1=lin[1][i]; // second point of line
if ((!his[p0])&&(!his[p1])) // both unused yet?
{
his[p0]=true; // set them as used
his[p1]=true;
out[outs]=i; // add new edge to result list
outs++;
}
}
// here out[outs] holds the result
}
runtime is linear and on my machine it took ~10ms so no need for parallelization.
In case bool is not a single bit you can pack the histogram into unsigned integers using its bits (for example pack 32 points into single 32 bit int variable) to preserve memory. In such case 1M points results in 125000 Bytes table which is not a problem these days
When I feed your data to the code:
int lin[2][lins]= // lines
{
{ 1, 2, 3, 4, 3, 2, 1 },
{ 2, 3, 3, 5, 4, 3, 1 },
};
I got this result:
{ 0, 2, 3 }

Generate strings which have normal distribution length (Matlab)

I have an initial string : S= 'ABCDEFGH'
How can I generate 100 strings from S where there is no repeated character in each string and the characters in each string will be in an order from 'A' to 'H' . Every string has diffent length which is based on normal distribution.Here, the mean=4, and sd = 1
The expected output (may be different because of random strings are genrated should be 100 srings like below:
Output = { 'ABEGH'; 'ABE'; 'DH' ; 'BCGH' ..........; 'ABCDEGH'}
Thanks !
It's not clear what distribution you want. This is a generic answer for any length distribution.
S = 'ABCDEFGH'; %// input characters
distr = [.1 .2 .1 .2 .1 .1 .1 .1]; %// probability of getting lengths 1, 2, ..., numel(S)
n = randsample(numel(distr), 1, 1, distr); %// random length with the specified distribution
ind = sort(randperm(numel(S), n)); %// take n sorted values from 1, ..., numel(S);
result = S(ind);
Assuming all permutations produced from randperm are equally likely1 the above code, conditioned on a given n, generates all possible n-digit substrings with the same probability.
1
In old Matlab versions randperm was an m-function. From its source code it was clear that it produced all permutations with the same probability. In recent versions it's not an m-function anymore, and its documentation doesn't specify that.

Why is this algorithm for the integer knapsack incorrect?

This is what I think I need to do.
Given 'n' items of weight 'Wi' and value 'Vi', I need to maximize the value of the knapsack while staying under the weight limit WEIGHT_MAX.
So, what I thought of doing was, sorting the items according to their value ( High to low ), and then choosing items as long as the weight of the knapsack is less than WEIGHT_MAX.
i.e. something like this
while( temp_weight <= WEIGHT_MAX && i <= INDEX_MAX )
{
if ( temp_weight + W[i] > WEIGHT_MAX ) { i++; continue;}
temp_weight += W[i];
value += V[i];
i++;
}
Why is this algorithm wrong?
Consider these sorted elements:
Vi={10, 5, 5, 5, 5, 5, 5}
Wi={4, 1, 1, 1, 1, 1, 1}
With your algorithm if your WEIGHT_MAX is 4, you would choose just the V=10 element (total value 10). But the optimal solution would be 4 elements with V=5 (total value 20).
That's why your algorithm doesn't lead to optimum.
A few algorithms to solve it: http://en.wikipedia.org/wiki/Knapsack_problem

Resources