I'm looking for a Python module or codes to permutate an asymmetric matrix into triangular form - python-3.x

I have an asymmetric matrix.
A B C D
A 0 0 1 0
B 1 0 0 1
C 0 0 0 0
D 1 1 1 0
I'm trying to switch rows and columns to make it into triangular form.
Like:
C A D B
C 0 1 1 0
A 0 0 1 1
D 0 0 0 1
B 0 0 1 0
Someone gave some codes which is made by VBA and used in Microsoft Excel. According to the note in those code, I found a paper ("Algorithm 529:Permutations to Block Triangular Form.") published by 1978 which was made in Fortran. I also found a paper (Implementation of Tarjan's Algorithm for the Block Triangularization of a Matrix) which might describe the concept.
I look at numpy but I didn't find such function. I'm wondering whether there is a ready-made module in some packages to complete this process. Thanks a lot.

BTW, I don't see triangular matrix in example.
Problem is NP-complete, so you can just generate all permutations of rows and columns until triangular matrix is reached. (Or try to implement algo from the linked article)
Article name for record
Obtaining a Triangular Matrix by Independent Row-Column Permutations
Guillaume Fertin, Irena Rusu, Stéphane Vialette

Related

How to create a table of random 0's and 1's (with weights) in Excel?

I would like to create a random, square table of 0's and 1's like so
0 1 0
0 0 1
0 1 0
but only bigger (around 14 x 14). The diagonal should be all 0's and the table should be symmetric across the diagonal. For example, this would not be good:
0 1 1
0 0 0
0 1 0
I would also like to have control over the number of 1 's that appear, or at least the probability that 1's appear.
I only need to make a few such tables, so this does not have to be fully automated by any means. I do not mind at all doing a lot of the work by hand.
If possible, I would greatly prefer doing this without coding in VBA (small code in cells is okay of course) since I do not know it at all.
Edit: amended so as to return a symmetrical array, as requested by the OP.
=LET(λ,RANDARRAY(ξ,ξ),IF(1-MUNIT(ξ),GESTEP(MOD(MMULT(λ,TRANSPOSE(λ)),1),1-ζ),0))
where ξ is the side length of the returned square array and ζ is an approximation as to the probability of a non-diagonal entry within that array being unity.
As ξ increases, so does the accuracy of ζ.

Excel How do I fill a matrix by MAXIFS comparison

I have this dataset:
Groups A A B B
location a b c d
3 4 0 5
I also have a transformed version for better clarification:
Groups location
A a 3
A b 4
B c 0
B d 5
What I want is a simple matrix that fills binary.
The function should check each row and column wether it has the MAXIF from it's respective group and then compare it to the second value, which is a second MAXIF from it's group. Therefor the combination b and d has to resolve to 1.
The intended output is as following:
I have a dataset with a-n locations that are grouped in a-n groups. So group A has the locations a,b; group B the locations c,d. The columns represent different features at each location.
I want to build a matrix out of it, but not the "usual" distance matrix but one that incorporates the following questions:
-When building the matrix, the maximum values of each group get compared
-I want to find out, if the value I am looking at is the maximum value in this group and if so, compare it to the second groups maximum -> if this number is larger -> set it to 1
-this should automatically fill all fields in the matrix
I need this for a network analysis of my data, to wipe out not needed connections
My current input is somewhat like this:
=IF(AND(>0(MAXIF()=value)>(AND(>0(MAXIF()=value);1;0)
How it looks like in excel:
=IF(AND(A$1<>$A7; A$3>0;(MAXIFS($A$3:$D$3;$A$1:$D$1;A$1)=A$3))<(AND(A$1<>$A7; $C7>0;MAXIFS($C$7:$C$10;$A$7:$A$10;$A7)=$C7));1;0)
However I think internally it does not actually compare values but TRUES and FALSE. Therefore connections that are smaller than MAX are getting 1s. My output currently:
A A B B
a b c d
A a 0 0 0 0
A b 0 0 1 0
B c 0 0 0 0
B d 1 0 0 0
As you can see, the value a and d resolve to 1.
The output should look like this:
(the matrix is generally speaking 0, but when beacons like d (5) and b (4) meet, it gets "1" since both are the highest within their group. Only here's a connection between the two groups.
A A B B
a b c d
A a 0 0 0 0
A b 1 0 0 1
B c 0 0 0 0
B d 0 1 0 0
I understand the problem but don't know how to fix that.
I'm fairly sure this doesn't work properly, but it may help. I've restructured your data slightly to make it easier to write the formula.
The formula in C3 etc is
=IF(AND(MAXIFS($G$3:$G$6,$A$3:$A$6,$A3)=$G3,MAXIFS($C$7:$F$7,$C$1:$F$1,C$1)=C$7,$G3>MAXIFS($G$3:$G$6,$A$3:$A$6,"<>" & $A3)),1,0)
It's just checking if the value in G is the max for the group in column A, and if the value in row 7 is the max for group in row 1, and if the value in G is greater than the max of the other group. If they're all satisfied it inserts a '1'.

What does fifth column of feColorMatrix stands for exactly

In mozilla's doc for feColorMatrix it is stated that
The SVG filter element changes colors based on a
transformation matrix. Every pixel's color value (represented by an
[R,G,B,A] vector) is matrix multiplied to create a new color.
However in feColorMatrix there are 5 columns, not 4.
In an excellent article that can be considered as a classical reference it is stated that:
The matrix here is actually calculating a final RGBA value in its
rows, giving each RGBA channel its own RGBA channel. The last number
is a multiplier.
But that does not explain a lot. As far as I understand, since after applying filter we basically modify exactly R, G, B and A channels and nothing else there's no need in this additional parameter. Indirectly there's an evidence for that in the article itself - all numerous examples of feColorMatrix-based filters provided - all have zeroes as fifth component. Also, why it's a multiplier?
In another famous article it is stated that:
For the other rows, you are creating each of the rgba output values as
the sum of the rgba input values multiplied by the corresponding
matrix value, plus a constant.
Calling it a constant added makes more sense than calling it a multiplier, however it's still unclear what does fifth component in feColor matrix stands for and what is unachievable without it - so that would be my question.
My last hope was the w3c reference but it's surprisingly vague as well.
The specification is clear although you do need to understand matrix math. The fifth column is a fixed offset. It's useful because if you want to add a specific amount of R/G/B/A to your output, that column is the only way to do it. Or if you want to recolor something to a specific color, that's also the way to do it.
For example - if you have multiple opaque colors in your input, but you want to recolor everything to rgba(255,51,0,1) then this is the matrix you would use.
0 0 0 0 1.0
0 0 0 0 0.2
0 0 0 0 0
0 0 0 1 0
aka
<feColorMatrix type="matrix" values="0 0 0 0 1 0 0 0 0 0.5 0 0 0 0 0 0 0 0 1 0"/>
Try out these sliders for yourself:
https://codepen.io/mullany/pen/qJCDk

Group people based on their hobbies in Spark

I am working with PySpark on a case where I need to group people based on their interests. Let's say I have n persons:
person1, movies, sports, dramas
person2, sports, trekking, reading, sleeping, movies, dramas
person3, movies, trekking
person4, reading, trekking, sports
person5, movies, sports, dramas
.
.
.
Now I want to group people based on their interests.
Group people who have at least m common interests (m is user input, it could be 2, 3, 4...)
Let's assume m=3
Then the groups are:
(person1, person2, person5)
(person2, person4)
User who belongs to x groups (x is user input)
Let's assume x=2
Then
person2 is in two groups
My response will be algebraic and not Spark/Python specific, but can be implemented in Spark.
How can we express the data in your problem?
I will go with matrix - each row represents a person, each column represents interest. So following your example:
movies,sports,trekking,reading,sleeping,dramas
P1: 1 1 0 0 0 1
P2: 1 1 1 1 1 1
P3: 1 0 1 0 0 0
P4: 0 0 1 1 0 1
P5: 1 1 0 0 0 1
What if we would like to investigate similarity of P2 and P3 - check how many interests do they share - we could use the following formula:
(movies)+(sports)+(trekking)+(reading)+(sleeping)+(dramas)
1*1 + 1*0 + 1*1 + 1*0 + 1*0 + 1*0 = 2
It may look familiar to you - it looks like part of matrix multiplication calculation.
To get full usage of the fact that we observed, we have to transpose the matrix - it will look like that:
P1,P2,P3,P4,P5
movies 1 1 1 0 1
sports 1 1 0 0 1
trekking 0 1 1 1 0
reading 0 1 0 1 0
sleeping 0 1 0 0 0
dramas 1 1 0 1 1
Now if we multiply the matrices (original and transposed) you would get new matrix:
P1 P2 P3 P4 P5
P1 3 3 1 1 3
P2 3 6 2 3 4
P3 1 2 2 1 1
P4 1 3 1 2 1
P5 3 3 1 1 3
What you see here is the result you are looking for - check the value on the row/column crossing and you will get number of shared interests.
How many interests do P2 share with P4? Answer: 3
Who shares 3 interests with P1? Answer: P2 and P5
Who shares 2 interests with P3? Answer: P2
Some hints on how to apply this idea into Apache Spark
How to operate on matrices using Apache Spark?
Matrix Multiplication in Apache Spark
How to transpose matrix using Apache Spark?
Matrix Transpose on RowMatrix in Spark
EDIT 1: Adding more realistic method (after the comments)
We have a table/RDD/Dataset "UserHobby":
movies,sports,trekking,reading,sleeping,dramas
P1: 1 1 0 0 0 1
P2: 1 1 1 1 1 1
P3: 1 0 1 0 0 0
P4: 0 0 1 1 0 1
P5: 1 1 0 0 0 1
Now to find all the people that share 2 groups with P1 you would have to execute:
SELECT * FROM UserHobby
WHERE movies*1 + sports*1 + sports*0 +
trekking*0 + reading*0+ sleeping*0 + dramas*1 = 2
Now you would have to repeat this query for all the users (changing 0s and 1s to the actual values). The algorithm complexity is O(n^2 * m) - n number of users, m number of hobbies
What is nice about this method is that you don't have to generate subsets.
Probably my answer might not be the best, but will do the work. If you know the total list of hobbies in prior, you can write a piece of code which can compute the combinations before going into spark's part.
For example:
Filter out the people whose hobbies_count < input_number at the the very start to scrap out unwanted records.
If the total list of hobbies are {a,b,c,d,e,f} and the input_number is 2, the list of combinations in this case would be
{(ab)(ac)(ad)(ae)(af)(bc)(bd)(be)(bf)(cd)(ce)(cf)(de)(df)(ef)}
So will need to generate the possible combinations in prior for the input_number
Later, perform a filter for each combination and track the record count.
If the number of users is large, you can't possibly think about going for any User x User approach.
Step 0. As a first step, we should ignore all the users who don't have at least m interests (since they cannot have at least m common interests with anyone).
Step 1. Possible approaches:
i)Brute Force: If the maximum number of interests is small, say 10, you can generate all the possible interest combinations in a HashMap, and assign an interest group id to each of these. You will need just one pass over the interest set of a user to find out which interest groups do they qualify for. This will solve the problem in one pass.
ii) Locality Sensitive Hashing: After step 0 is done, we know that we only have users that have a minimum of m hobbies. If you are fine with an approximate answer, locality sensitive hashing can help. How to understand Locality Sensitive Hashing?
A sketch of the LSH approach:
a) First map each of the hobbies to an integer (which you should do anyways, if the dataset is large). So we have User -> Set of integers (hobbies)
b) Obtain a signature for each user, by taking the min hash (Hash each of the integers by a number k, and take the min. This gives User -> Signature
c) Explore the users with the same signature in more detail, if you want (or you can be done with an approximate answer).

Proper use of *apply with a function that takes strings as inputs

I am interested in applying the levenshteinSim function from the Record-Linkage package to vectors of strings (there's a good discussion on the function here ).
Imagine that I have a vector called codes: "A","B","C","D",etc.;
And a vector called tests: "A","B","C","D",etc.
Using sapply to test a particular value in 'tests' against the vector of codes,
sapply(codes,levenshteinSim,str2=tests[1])
I would expect to get a list or vector (my apologies if I make terminological mistakes): [score1] [score2] [score3].
Unfortunately, the output is a test of the value in tests[1] against c("A","B","C","D", ...) -- a single value.
Ultimately, I want to *apply the two vectors against one another to produce a matrix of length len1*len2 -- but I don't want to move forward until I understand what I'm doing wrong.
Can anyone provide guidance?
I'm not sure where the problem lies:
library(RecordLinkage)
sapply(codes,levenshteinSim,str2=test)
A B C D
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
When str2 is just one item, you get a length 4 vector.

Resources