Is there a way to call the pcfcross function on groups of marks? - spatstat

I'm using the pcfcross function to estimate the pair correlation functions (PCFs) between pairs of cell types, indicated by marks. I would now like to expand my analysis to include measuring the PCFs between cell types and groups of cell types. Is there a way to use the pcfcross function on a group of marks?
Alternatively, is there a way to change the marks of a group of marks to a singular mark?

You can collapse several levels of a factor to a single level, using the spatstat function mergeLevels. This will group several types of points into a single type.
However, this may not give you any useful new information. The pair correlation function is a second-order summary, so the pair correlation for the grouped data can be calculated from the pair correlations for the un-grouped data. (See Chapter 7 of the spatstat book).

Related

WAG matrix implementation

I am working with certain programs in python3.4. I want to use WAG matrix for phylogeny inference, but I am confused about the formula implemented by it.
For example, in phylogenetics study, when a sequence file is used to generate a distance based matrix, there is a formula called "p-distance" implemented and on the basis of this formula and some standard values for sequence data, a matrix is generated which is later used to construct a tree. When a character based method for tree construction is used, "WAG" is one of the matrices used for likelihood tree construction. What I want to say is that if one wants to implement this matrix, then what is the formula basis for it?
I want to write codes for this implementation. But first I need to understand the logic used by WAG matrix.
I have an aligned protein sequence file and I need to generate "WAG"
matrix from it. The thing is that I have been studying literature
regarding wag matrix but I could not get how does it perform
calculation??? Does it have a specific formula?? (For example,
'p-distance' is a formula used bu distance matrix) I want to give
aligned protein sequence file as input and have a matrix generated as
output.

How to add NER tags to features

I have a set of training sentences for which I computed some float features. In each sentence, two entities are identified. They are either of type 'PERSON', 'ORGANIZATION', 'LOCATION', or 'OTHER'. I would like to add these types to my feature matrix (which stores float variables).
My question is: is there a recommended way to add these entity types ?
I could think of two ways for now:
either adding TWO columns, one for each entity, that will be filled with entity types ids (e.g 0 to 3 or 1 to 4)
adding EIGHT columns, one for each entity type and each entity, and filling them with 0's and 1's
Best!
I would recommend that you use something that can easily be normalized and which is in the same range as the rest of your data.
So if all your float values are between -1 and 1, i would keep the values from your "Named Entity Recognition" in the same range.
So depending on what you prefer or what gives you the best result you could either assign 4 values in the same range as the rest of your floats or use a binary result with more columns.
Finally, the second suggestion (adding EIGHT columns, one for each entity type and each entity, and filling them with 0's and 1's) worked fine!

Connecting exchange names and codes to LCA inventory results

I'm getting into Brightway2 for some energy system modeling and I'm still getting used to the all of the concepts.
I've created a small custom demo database, and run lca.lci() and lca.lcia(). lca.inventory and lca.characterized_inventory both return sparse matrices of the results. My question, which may be very simple, is how can you connect the values in the matrix to the exchange names and keys. I.e., if I wanted to print the results to a file, how would I match the exchanges to the inventory values?
Thanks.
To really understand what is going on, it is useful to understand the difference between "intermediate" data (stored as structured text files) and "processed" data (stored as numpy structured arrays). These concepts are described both here and here.
However, to answer your question directly: what each row and column stand for in the different matrices and arrays (e.g. lca.inventory matrix, lca.supply_array, lca.characterized_inventory) are contained in a set of dictionaries that are associated with your LCA object. These are:
activity_dict: Columns in the technosphere matrix
product_dict : Rows in the technosphere matrix
biosphere_dict: Rows in the biosphere matrix
For example, lca.product_dict yields, in the case of an LCA I just did:
{('ei32_CU_U', '671c1ae85db847083176b9492f000a9d'): 8397,
('ei32_CU_U', '53398faeaf96420408204e309262b8c5'): 536,
('ei32_CU_U', 'fb8599da19dabad6929af8c3a3c3bad6'): 7774,
('ei32_CU_U', '28b3475e12e4ed0ec511cbef4dc97412'): 3051, ...}
with the key in the dictionary being the actual product in my inventory database and the value is the row in the demand_array or the supply_array.
More useful may be the reverse of these dictionaries. Let's say you want to know what a value in e.g. your supply_array refers to, you can create a reverse dictionary using a dict comprehension :
inv_product_dict = {v: k for k, v in lca.product_dict.items()}
and then simply use it directly to obtain the information you are after. Say you want to know what is in the 10th row of the supply_array, you can simply do inv_product_dict[10], which in my case yields ('ei32_CU_U', '4110733917e1fcdc7c55af3b3f068c72')
The same types of logic applies with biosphere (or elementary) flows, found in the lca.biosphere_dict (in LCA parlance, rows in the B matrix), and activities, found in the lca.activity_dict (columns of the A or B matrices).
Note that you can generate the reverse of the activity_dict/product_dict/biosphere_dict simultaneously using lca.reverse_dict(). The syntax then is:
rev_act_dict, rev_product_dict, rev_bio_dict = lca.reverse_dict()

Return variable from its name in MATLAB

Say I have a variable var=1 and a string str='var'.
How can I obtain the value of var from str?. I tried using str2num(str), but it didn't work.
Also, if I had 2 strings str1='some letters' and str2='str1', can I obtaing the phase 'some letters' from str2?
I want to do this because I have many matrices (quite big) and I want to separate them in some groups, so I thought about making cells with the names of each of the matrices that belong to a group (a matrix can belong to more than one group, so making cells with the matrices is not very good).
You can use eval:
x = eval( str ) ;
But it's not recommended.
Though it can easily be achieved with an eval as #Shai mentioned, you probably don't really want to do this. Using eval hinders your debugging and depending on the name of variables seriously limits the flexibility of your code. If you want to name something, you may be better off using a struct with a data field and a name field instead.
Judging from your description, I wonder about the following:
1. Why do you have many matrices?
For each variable that you have, you depend on a name. Depending on a lot of names is typically undesirable. Hence my suggestion:
Use a (cell) array containing these matrices
2. What way do you exactly want them to be in a group
It is not clear to me how you want the grouping to work, but think of this:
If you want to use names, create a struct or array of structs with a nameField, but
otherwise just use a cell array and have each matrix get a number.
You can now handle the matrices more easily and things like 'selecting 10 random matrices' or 'selecting all matrices whose nameField contains 'abc'' can be done easily and efficiently.
You can now also have a field with your data specifying in which groups it is, or you can define groups as simple lists of numbers.

Search selection

For a C# program that I am writing, I need to compare similarities in two entities (can be documents, animals, or almost anything).
Based on certain properties, I calculate the similarities between the documents (or entities).
I put their similarities in a table as below
X Y Z
A|0.6 |0.5 |0.4
B|0.6 |0.4 |0.2
C|0.6 |0.3 |0.6
I want to find the best matching pairs (eg: AX, BY, CZ) based on the highest similarity score. High score indicates the higher similarity.
My problem arises when there is a tie between similarity values. For example, AX and CZ have both 0.6. How do I decide which two pairs to select? Are there any procedures/theories for this kind of problems?
Thanks.
In general, tie-breaking methods are going to depend on the context of the problem. In some cases, you want to report all the tying results. In other situations, you can use an arbitrary means of selection such as which one is alphabetically first. Finally, you may choose to have a secondary characteristic which is only evaluated in the case of a tie in the primary characteristic.
Additionally, you can always report one or more and then alert the user that there was a tie to allow him or her to decide for him- or herself.
In this case, the similarities you should be looking for are:
- Value
- Row
- Column
Objects which have any of the above in common are "similar". You could assign a weighting to each property, so that objects which have the same value are more similar than objects which are in the same column. Also, objects which have the same value and are in the same column are more similar than objects with just the same value.
Depending on whether there are any natural ranges occurring in your data, you could also consider comparing ranges. For example two numbers in the range 0-0.5 might be somewhat similar.

Resources