Let's suppose I have a six by six cube each with xyz coordinates.
Moving from the middle cube (0,0,0) to the other sides (let's say (0,1,0), I would like to find the other 4 components that are peperdicular to the middle cube following the direction of (0,1,0).
If we move one dimension, this is easy (and my brain can grasp it)... the components will be (-1,0,0),(+1,0,0), (0,0,+1), (0,0,-1).
Now, could somebody help me with moving to size where two (to (1,1,0) or three coordinates change (1,1,-1)?
Thanks,
Rodrigo
There is infinity number of perpendicular vectors in 3D space.
If you want to restrict their components by values 0, +-1, then consider the next approach:
Your vector components are A=(ax, ay, az). Dot product of perpendicular vector B=(bx, by, bz) with A must be zero
ax * bx + ay * by + az * bz = 0
To form components of B:
get A components
nullify arbitrary component (if one of other components is not zero)
exchange two another components
negating one of them
example:
(bx, by, bz) = (0, -az, ay)
So for vector A=(1,1,-1) one of six perpendiculars is B1=(0, 1, 1)
For vector A=(1,1,0) there are four variants with given restrictions:
(-1, 1, 0)
(1, -1, 0)
(0, 0, 1)
(0, 0, -1)
If you want to fix a pair components of perp. vector - just substitute needed values in dot product formula and solve for unknown component of B
Thank you, this is exactly what I did.
Here is my solution:
(in matlab) I created a number of all possibility unit values:
pos_vals=[ 0 0 0 ; -1 0 0 ; 1 0 0 ; 0 1 0 ; 0 -1 0 ; -1 -1 0 ; 1 1 0 ; -1 1 0 ; 1 -1 0; 0 0 1 ; -1 0 1 ; 1 0 1 ; 0 1 1 ; 0 -1 1 ; -1 -1 1 ; 1 1 1 ; -1 1 1 ; 1 -1 1 ; 0 0 -1 ; -1 0 -1 ; 1 0 -1 ; 0 1 -1 ; 0 -1 -1 ; -1 -1 -1 ; 1 1 -1 ; -1 1 -1 ; 1 -1 -1];
And then based on my reference coordinate [eg vec_ofinterest=(1,1,0) ] ,
I do the following:
for idx_posvals=1:size(pos_vals,1)
gg(idx_posvals)=dot(vec_ofinterest,pos_vals(idx_posvals,:));
if gg(idx_posvals) == 0
pos_vals(idx_posvals,:)
end
end
Which give me 8 solutions (including the reciprocals you mentioned).
-1 1 0
1 -1 0
0 0 1
-1 1 1
1 -1 1
0 0 -1
-1 1 -1
1 -1 -1
It looks like this is solved. In case somebody find and error, please let me know.
Rodrigo
Related
I tried different ways but it seems impossible for me to do it efficiently without looping through.
Input is an array y and a percentage x.
e.g. input is
y=np.random.binomial(1,1,[10,10])
x=0.5
output
[[0 0 0 0 1 1 1 1 0 1]
[1 0 1 0 0 1 0 1 0 1]
[1 0 1 1 1 1 0 0 0 1]
[0 1 0 1 1 0 1 0 1 1]
[0 1 1 0 0 1 1 1 0 0]
[0 0 1 1 1 0 1 1 0 1]
[0 1 0 0 0 0 1 0 1 1]
[0 0 0 1 1 1 1 1 0 0]
[0 1 1 1 1 0 0 1 0 0]
[1 0 1 0 1 0 0 0 0 0]]
Here's one based on masking -
def set_nonzeros_to_zeros(a, setz_ratio):
nz_mask = a!=0
nz_count = nz_mask.sum()
z_set_count = int(np.round(setz_ratio*nz_count))
idx = np.random.choice(nz_count,z_set_count,replace=False)
mask0 = np.ones(nz_count,dtype=bool)
mask0.flat[idx] = 0
nz_mask[nz_mask] = mask0
a[~nz_mask] = 0
return a
We are skipping the generation all the indices with np.argwhere/np.nonzero in favor of a masking based one to focus on performance.
Sample run -
In [154]: np.random.seed(0)
...: a = np.random.randint(0,3,(5000,5000))
# number of non-0s before using solution
In [155]: (a!=0).sum()
Out[155]: 16670017
In [156]: a_out = set_nonzeros_to_zeros(a, setz_ratio=0.2) #set 20% of non-0s to 0s
# number of non-0s after using solution
In [157]: (a_out!=0).sum()
Out[157]: 13336014
# Verify
In [158]: 16670017 - 0.2*16670017
Out[158]: 13336013.6
There are a few vectorized methods that might help you, depending on what you want to do:
# Flatten the 2D array and get the indices of the non-zero elements
c = y.flatten()
d = c.nonzero()[0]
# Shuffle the indices and set the first 100x % to zero
np.random.shuffle(d)
x = 0.5
c[d[:int(x*len(d))]] = 0
# reshape to the original 2D shape
y = c.reshape(y.shape)
No doubt there are some efficiency improvements to be made here.
I'm dabbling my feet with J and, to get the ball rolling, decided to write a function that:
gets integer N;
spits out a table that follows this pattern:
(example for N = 4)
1
0 1
0 0 1
0 0 0 1
i.e. in each row number of zeroes increases from 0 up to N - 1.
However, being newbie, I'm stuck. My current labored (and incorrect) solution for N = 4 case looks like:
(4 # ,: 0 1) #~/"1 1 (1 ,.~/ i.4)
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
And the problem with it is twofold:
it's not general enough and looks kinda ugly (parens and " usage);
trailing zeroes - as I understand, all arrays in J are homogeneous, so in my case every row should be boxed.
Like that:
┌───────┐
│1 │
├───────┤
│0 1 │
├───────┤
│0 0 1 │
├───────┤
│0 0 0 1│
└───────┘
Or I should use strings (e.g. '0 0 1') which will be padded with spaces instead of zeroes.
So, what I'm kindly asking here is:
please provide an idiomatic J solution for this task with explanation;
criticize my attempt and point out how it could be finished.
Thanks in advance!
Like so many challenges in J, sometimes it is better to keep your focus on your result and find a different way to get there. In this case, what your initial approach is doing is creating an identity matrix. I would use
=/~#:i. 4
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
You have correctly identified the issue with the trailing 0's and the fact that J will pad out with 0's to avoid ragged arrays. Boxing avoids this padding since each row is self contained.
So create your lists first. I would use overtake to get the extra 0's
4{.1
1 0 0 0
The next line uses 1: to return 1 as a verb and boxes the overtakes from 1 to 4
(>:#:i. <#:{."0 1:) 4
+-+---+-----+-------+
|1|1 0|1 0 0|1 0 0 0|
+-+---+-----+-------+
Since we want this as reversed and then made into strings, we add ":#:|.#: to the process.
(>:#:i. <#:":#:|.#:{."0 1:) 4
+-+---+-----+-------+
|1|0 1|0 0 1|0 0 0 1|
+-+---+-----+-------+
Then we unbox
>#:(>:#:i. <#:":#:|.#:{."0 1:) 4
1
0 1
0 0 1
0 0 0 1
I am not sure this is the way everyone would solve the problem, but it works.
An alternative solution that does not use boxing and uses the dyadic j. (Complex) and the fact that
1j4 # 1
1 0 0 0 0
(1 j. 4) # 1
1 0 0 0 0
(1 #~ 1 j. ]) 4
1 0 0 0 0
So, I create a list for each integer in i. 4, then reverse them and make them into strings. Since they are now strings, the extra padding is done with blanks.
(1 ":#:|.#:#~ 1 j. ])"0#:i. 4
1
0 1
0 0 1
0 0 0 1
Taking this step by step as to hopefully explain a little better.
i.4
0 1 2 3
Which is then applied to (1 ":#:|.#:#~ 1 j. ]) an atom at a time, hence the use of "0
Breaking down what is going on within the parenthesis. I first take the right three verbs which form a fork.
( 1 j. ])"0#:i.4
1 1j1 1j2 1j3
Now, effectively that gives me
1 ":#:|.#:#~ 1 1j1 1j2 1j3
The middle tine of the fork becomes the verb acting on the two noun arguments.The ~ swaps the arguments. so it becomes equivalent to
1 1j1 1j2 1j3 ":#:|.#:# 1
which because of the way #: works is the same as
": |. 1 1j1 1j2 1j3 # 1
I haven't shown the results of these components because using the "0 on the fork changes how the arguments that are sent to the middle tine and assembled later. I'm hoping that there is enough here that with some hand waving the explanation may suffice
The jump from tacit to explicit can be a big one, so it may be a better exercise to write the same verb explicitly to see if it makes more sense.
lowerTriangle =: 3 : 0
rightArg=. i. y
complexCopy=. 1 j. rightArg
1 (":#:|.#:#~)"0 complexCopy
)
lowerTriangle 4
1
0 1
0 0 1
0 0 0 1
lowerTriangle 5
1
0 1
0 0 1
0 0 0 1
0 0 0 0 1
See what happens when you 'get the ball rolling'? I guess the thing about J is that the ball goes down a pretty steep slope no matter where you begin. Exciting, eh?
I have a bag-of-words representation of a corpus stored in an D by W sparse matrix word_freqs. Each row is a document and each column is a word. A given element word_freqs[d,w] represents the number of occurrences of word w in document d.
I'm trying to obtain another D by W matrix not_word_occs where, for each element of word_freqs:
If word_freqs[d,w] is zero, not_word_occs[d,w] should be one.
Otherwise, not_word_occs[d,w] should be zero.
Eventually, this matrix will need to be multiplied with other matrices which might be dense or sparse.
I've tried a number of methods, including:
not_word_occs = (word_freqs == 0).astype(int)
This words for toy examples, but results in a MemoryError for my actual data (which is approx. 18,000x16,000).
I've also tried np.logical_not():
word_occs = sklearn.preprocessing.binarize(word_freqs)
not_word_occs = np.logical_not(word_freqs).astype(int)
This seemed promising, but np.logical_not() does not work on sparse matrices, giving the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
Any ideas or guidance would be appreciated.
(By the way, word_freqs is generated by sklearn's preprocessing.CountVectorizer(). If there's a solution that involves converting this to another kind of matrix, I'm certainly open to that.)
The complement of the nonzero positions of a sparse matrix is dense. So if you want to achieve your stated goals with standard numpy arrays you will require quite a bit of RAM. Here's a quick and totally unscientific hack to give you an idea, how many arrays of that sort your computer can handle:
>>> import numpy as np
>>> a = []
>>> for j in range(100):
... print(j)
... a.append(np.ones((16000, 18000), dtype=int))
My laptop chokes at j=1. So unless you have a really good computer even if you can get the complement (you can do
>>> compl = np.ones(S.shape,int)
>>> compl[S.nonzero()] = 0
) memory will be an issue.
One way out may be to not explicitly compute the complement let's call it C = B1 - A, where B1 is the same-shape matrix completely filled with ones and A the adjacency matrix of your original sparse matrix. For example the matrix product XC can be written as XB1 - XA so you have one multiplication with the sparse A and one with B1 which is actually cheap because it boils down to computing row sums. The point here is that you can compute that without computing C first.
A particularly simple example would be multiplication with a one-hot vector. Such a multiplication just selects a column (if multiplying from the right) or row (if multiplying from the left) of the other matrix. Meaning you just need to find that column or row of the sparse matrix and take the complement (for a single slice no problem) and if you do this for a one-hot matrix, as above you needn't compute the complement explicitly.
Make a small sparse matrix:
In [743]: freq = sparse.random(10,10,.1)
In [744]: freq
Out[744]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in COOrdinate format>
the repr(freq) shows the shape, elements and format.
In [745]: freq==0
/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:213: SparseEfficiencyWarning: Comparing a sparse matrix with 0 using == is inefficient, try using != instead.
", try using != instead.", SparseEfficiencyWarning)
Out[745]:
<10x10 sparse matrix of type '<class 'numpy.bool_'>'
with 90 stored elements in Compressed Sparse Row format>
If do your first action, I get a warning and new array with 90 (out of 100) nonzero terms. That not is no longer sparse.
In general numpy functions do not work when applied to sparse matrices. To work they have to delegate the task to sparse methods. But even if logical_not worked it wouldn't solve the memory issue.
Here is an example of using Pandas.SparseDataFrame:
In [42]: X = (sparse.rand(10, 10, .1) != 0).astype(np.int64)
In [43]: X = (sparse.rand(10, 10, .1) != 0).astype(np.int64)
In [44]: d1 = pd.SparseDataFrame(X.toarray(), default_fill_value=0, dtype=np.int64)
In [45]: d2 = pd.SparseDataFrame(np.ones((10,10)), default_fill_value=1, dtype=np.int64)
In [46]: d1.memory_usage()
Out[46]:
Index 80
0 16
1 0
2 8
3 16
4 0
5 0
6 16
7 16
8 8
9 0
dtype: int64
In [47]: d2.memory_usage()
Out[47]:
Index 80
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
dtype: int64
math:
In [48]: d2 - d1
Out[48]:
0 1 2 3 4 5 6 7 8 9
0 1 1 0 0 1 1 0 1 1 1
1 1 1 1 1 1 1 1 1 0 1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 0 1 1
4 1 1 1 1 1 1 1 1 1 1
5 0 1 1 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1 1 1
7 0 1 1 0 1 1 1 0 1 1
8 1 1 1 1 1 1 0 1 1 1
9 1 1 1 1 1 1 1 1 1 1
source sparse matrix:
In [49]: d1
Out[49]:
0 1 2 3 4 5 6 7 8 9
0 0 0 1 1 0 0 1 0 0 0
1 0 0 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0 0 0
5 1 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
7 1 0 0 1 0 0 0 1 0 0
8 0 0 0 0 0 0 1 0 0 0
9 0 0 0 0 0 0 0 0 0 0
memory usage:
In [50]: (d2 - d1).memory_usage()
Out[50]:
Index 80
0 16
1 0
2 8
3 16
4 0
5 0
6 16
7 16
8 8
9 0
dtype: int64
PS if you can't build the whole SparseDataFrame at once (because of memory constraints), you can use an approach similar to one used in this answer
I have a string:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG'
How can I combine pairs which have the last character of 1 pair is the first character of the follow pairs into strings? And the new strings must contain all of the character 'A','B','C','D','E','F' , 'G', those characters are appeared in the sup_pairs string.
The expected output should be:
S1 = 'BAEFCGD' % because BA will be followed by AE in sup_pairs string, so we combine BAE, and so on...we continue the rule to generate S1
S2 = 'DFCEABG'
If I have AB, BC and BD, the generated strings should be both : ABC and ABD .
If there is any repeated character in the pairs like : AB BC CA CE . We will skip the second A , and we get ABCE .
This, like all good things in life, is a graph problem. Each letter is a node, and each pair is an edge.
First we must transform your string of pairs into a numeric format so we can use the letters as subscripts. I will use A=2, B=3, ..., G=8:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG';
p=strsplit(sup_pairs,' ');
m=cell2mat(p(:));
m=m-'?';
A=sparse(m(:,1),m(:,2),1);
The sparse matrix A is now the adjacency matrix (actually, more like an adjacency list) representing our pairs. If you look at the full matrix of A, it looks like this:
>> full(A)
ans =
0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
As you can see, the edge BA, which translates to subscript (3,2) is equal to 1.
Now you can use your favorite implementation of Depth-first Search (DFS) to perform a traversal of the graph from your starting node of choice. Each path from the root to a leaf node represents a valid string. You then transform the path back into your letter sequence:
treepath=[3,2,6,7,4,8,5];
S1=char(treepath+'?');
Output:
S1 = BAEFCGD
Here's a recursive implementation of DFS to get you going. Normally in MATLAB you have to worry about not hitting the default limitation on recursion depth, but you're finding Hamiltonian paths here, which is NP-complete. If you ever get anywhere near the recursion limit, the computation time will be so huge that increasing the depth will be the least of your worries.
function full_paths = dft_all(A, current_path)
% A - adjacency matrix of graph
% current_path - initially just the start node (root)
% full_paths - cell array containing all paths from initial root to a leaf
n = size(A, 1); % number of nodes in graph
full_paths = cell(1,0); % return cell array
unvisited_mask = ones(1, n);
unvisited_mask(current_path) = 0; % mask off already visited nodes (path)
% multiply mask by array of nodes accessible from last node in path
unvisited_nodes = find(A(current_path(end), :) .* unvisited_mask);
% add restriction on length of paths to keep (numel == n)
if isempty(unvisited_nodes) && (numel(current_path) == n)
full_paths = {current_path}; % we've found a leaf node
return;
end
% otherwise, still more nodes to search
for node = unvisited_nodes
new_path = dft_all(A, [current_path node]); % add new node and search
if ~isempty(new_path) % if this produces a new path...
full_paths = {full_paths{1,:}, new_path{1,:}}; % add it to output
end
end
end
This is a normal Depth-first traversal except for the added condition on the length of the path in line 15:
if isempty(unvisited_nodes) && (numel(current_path) == n)
The first half of the if condition, isempty(unvisited_nodes) is standard. If you only use this part of the condition you'll get all paths from the start node to a leaf, regardless of path length. (Hence the cell array output.) The second half, (numel(current_path) == n) enforces the length of the path.
I took a shortcut here because n is the number of nodes in the adjacency matrix, which in the sample case is 8 rather than 7, the number of characters in your alphabet. But there are no edges into or out of node 1 because I was apparently planning on using a trick that I never got around to telling you about. Rather than run DFS starting from each of the nodes to get all of the paths, you can make a dummy node (in this case node 1) and create an edge from it to all of the other real nodes. Then you just call DFS once on node 1 and you get all the paths. Here's the updated adjacency matrix:
A =
0 1 1 1 1 1 1 1
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
If you don't want to use this trick, you can change the condition to n-1, or change the adjacency matrix not to include node 1. Note that if you do leave node 1 in, you need to remove it from the resulting paths.
Here's the output of the function using the updated matrix:
>> dft_all(A, 1)
ans =
{
[1,1] =
1 2 3 8 5 7 4 6
[1,2] =
1 3 2 6 7 4 8 5
[1,3] =
1 3 8 5 2 6 7 4
[1,4] =
1 3 8 5 7 4 6 2
[1,5] =
1 4 6 2 3 8 5 7
[1,6] =
1 5 7 4 6 2 3 8
[1,7] =
1 6 2 3 8 5 7 4
[1,8] =
1 6 7 4 8 5 2 3
[1,9] =
1 7 4 6 2 3 8 5
[1,10] =
1 8 5 7 4 6 2 3
}
I was experimenting with generating truth tables in J:
nand =: *:
nand /~ 0 1
1 1
1 0
bxor =: 22 b. NB. Built-in bitwise XOR
bxor /~ 0 1
0 1
1 0
Now I want to define my own logical xor, which I did like so:
xor =: 3 : 0
]y NB. monadic case is just the identity
:
(x*.-.y)+.(y*.-.x) NB. dyadic case is (x AND NOT y) OR (y AND NOT x)
)
This works as I expect when I call it directly.
0 xor 0 1
0 1
1 xor 0 1
1 0
But it doesn't generate a truth table:
xor /~ 0 1
0 0
Why not?
I thought maybe the problem was that ]/~ 0 1 itself produced a 1 x 2 array, so I changed the monadic part to use nand (*:y) because it produces the 2x2 array:
*:/~ 0 1
1 1
1 0
xor =: 3 : 0
*:y NB. certainly wrong, but at least has 2x2 shape.
:
(x*.-.y)+.(y*.-.x)
)
But I still get the same behavior:
xor /~ 0 1
0 0
Can someone help me understand the flaw in my thinking?
Your xor has infinite rank, while *:,~: have rank 0. You can verify that by using b.: v b. 0 like so:
~: b. 0
_ 0 0
*: b. 0
0 0 0
xor b. 0
_ _ _
What this means is that xor operates on the list 0 1 rather than on each individual atom 0, 1.
You will get the result you expect if you use xor with rank 0:
xor"0 /~ 0 1
0 1
1 0
Or if you define xor to be of rank 0.