Excel Mapping with multiple matches - excel

my knowledge of excel extends to most types of advanced formulas. I don't know much about how to use VBA or macros. I have a problem which I'm struggling to solve using formulas. I have a sheet with two columns that looks like this:
x1 y1
x1 y2
x1 y3
x1 y4
x2 y2
x2 y3
x2 y4
x3 y1
x4 y2
And I'm trying to map these onto a a sheet like this:
y1 y2 y3 y4
x1 1 1 1 1
x2 0 1 1 1
x3 1 0 0 0
x4 0 1 0 0
I usually try to apply a vlookup solution to such problems, but I can't figure out how to get vlookup to work given that the x values appear multiple times in the first table, and vlookup will always just stop at the first appearance.
Please let me know how to best approach solving this problem.
Thanks so much!

Use COUNTIFS()
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
But a pivot table may be better suited

Related

How to use VBA to incorporate functions to avoid occupation of extra cells?

This spreadsheet aims to calculate distances between atoms, and we want to improve the functions so as to avoid the occupation of extra columns. (See image postscripted. Atom coordinates are given in the Column A to D, and the atom pair whose distance should be calculated is given in Column F to G.)
Currently in the first step, coordinates of specified atoms are picked up in columns I to O. e.g. Cell I4 is filled with the function:
=VLOOKUP($F4,$A$4:$E$1023,2,FALSE)
and then in the next step, the distance could be resolved in Column Q with Euclidean distance formula on the coordinates picked up. e.g. Cell Q4 is:
=SQRT(POWER((I4-M4),2)+POWER((J4-N4),2)+POWER((K4-O4),2))
According to the distance calculating algorithm, once the two atoms are specified, the distance is then determined. Thus, is it possible to write a function with VBA to gracefully incorporate these functions and take away these pilot processes from columns I to O? (Because these columns will be used otherwise in the future; and the code readability would be terrible if we put, for example, the six VLOOKUP functions directly into the final SQRT function.)
I'm new to VBA. Any help would be appreciated. Thanks!
The original data in this spreadsheet is as below: (From the third line)
Atom_No X_coordinate Y_coordinate Z_coordinate Atom_No1 Atom_No2 X1 Y1 Z1 X2 Y2 Z2 Distance
1 2.35739851 13.17160225 4.022993565 4 2 3.827347994 9.501971245 8.374602318 4.403610706 11.14351559 6.991936684 2.222276039
2 4.403610706 11.14351559 6.991936684 3 2 0.721047342 12.58075523 2.64032793 4.403610706 11.14351559 6.991936684 5.879067059
3 0.721047342 12.58075523 2.64032793 1 4 2.35739851 13.17160225 4.022993565 3.827347994 9.501971245 8.374602318 5.879068118
4 3.827347994 9.501971245 8.374602318 2 1 4.403610706 11.14351559 6.991936684 2.35739851 13.17160225 4.022993565 4.13699687
… … … … 3 1 0.721047342 12.58075523 2.64032793 2.35739851 13.17160225 4.022993565 2.22227577
Finally, these two modules will work and get the correct result when CoordinateTableRange and Atom_No1-2 pair is given like this table. You could load these two modules, and write in column R (e.g. R4 cell) with
=AtomsDistance(F4,G4)
, then you'll find the distance is the same as you got in col Q.
Function CoordinateVLookUp(Atom_No As Integer, CoordinateTableRange As Range, Column As Integer, isFuzzy As Boolean) As Double
'To find the atom coordinates according to columns, with CoordinateTableRange selected and isFuzzy specified.
Dim myResult As Variant
myResult = Application.WorksheetFunction.VLookup(Atom_No, CoordinateTableRange, Column, isFuzzy)
If IsError(myResult) Then
MsgBox ("No result found.")
Else
CoordinateVLookUp = myResult
End If
End Function
Function AtomsDistance(Atom_No1 As Integer, Atom_No2 As Integer) As Double
'To call CoordinateVLookUp function above to acquire the x, y, z coordinates of both atoms, and then calculate the distance through Euclidean distance formula.
Dim x1 As Double
Dim y1 As Double
Dim z1 As Double
Dim x2 As Double
Dim y2 As Double
Dim z2 As Double
Dim CoordinateTableRange As Range
Set CoordinateTableRange = Range("A4:E1023") 'set should be added
x1 = CoordinateVLookUp(Atom_No1, CoordinateTableRange, 2, False)
y1 = CoordinateVLookUp(Atom_No1, CoordinateTableRange, 3, False)
z1 = CoordinateVLookUp(Atom_No1, CoordinateTableRange, 4, False)
x2 = CoordinateVLookUp(Atom_No2, CoordinateTableRange, 2, False)
y2 = CoordinateVLookUp(Atom_No2, CoordinateTableRange, 3, False)
z2 = CoordinateVLookUp(Atom_No2, CoordinateTableRange, 4, False)
AtomsDistance = Math.Sqr((x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2) + (z1 - z2) * (z1 - z2))
End Function

Excel Count/Sum Based on Multiple Criteria for Same Row

I have a relatively simple problem which I am getting stumped on - perhaps it is this brain fog from Covid. I'll try my best to explain the problem.
Here is a simulated dataset:
A B C D E F G H I J K L M N
1 X1 X2 X3 Y1 Y2 Y3 X1 X2 X3 X1 X2 X3 Ct St
2 1 2 0.2 0 2 0.5 1 2 0.1 2 0.3
3 1 2 0.3 1 1 0.2 1 0.3
4 1 2 0.6 1 2 0.1 1 0.6
5 1 2 1.1 2 0.7 1 0.5 1 1.1
A-N reflects the column names while the first column (1-5) reflects the row names in Excel.
Each column has been labelled as either X (e.g., male) and Y (e.g., female). There are three characteristics for male (X1, X2, X3) and three characteristics for female (Y1, Y2, Y3). We can think of adjacent columns as belonging to a trait (e.g., X1, X2, and X3 in columns A, B and C form a set of male characteristics for trait 1; X1, X2, and X3 in columns G, H and I form a set of similar characteristics but for trait 2, etc.).
For each row, I would like to calculate a count total (Ct, see column M) and sum total (St, see column N) based on a set of conditions.
Count total: Count the number of male (X) traits that feature a "1" for X1 and "2" for X2, giving a 'count total'.
Sum total: Sum the X3 values over male (X) traits that feature a "2" for X2, giving a 'sum total'.
I have manually calculated the count totals and sum totals for each column to make these definitions clearer. In row 1, there are two traits that fulfil the count total criteria (Ct = 2), whereby their X1 values = 1 and X2 values = 2. Notice that while the X2 value in column H qualifies (X2 = 2), X1 in column G is not equal to 1, so it is not counted. Furthermore, we only sum the X3 values for traits 1 and 2 (e.g., X3 in Column C and X3 in Column L), giving us a total of 0.3 (0.2 + 0.1).
The formulae should ignore sets of values that qualify but are for female traits (e.g., see row 3) and should work across missing values (e.g., in col J, row 4, X1 is missing, so it cannot be counted, even if X2 in col K row 4 features a qualifying value of 2).
I hope that makes sense.
My instinct was to use a SUMPRODUCT formula, but I am struggling to integrate the two conditions, e.g., for each row:
=SUMPRODUCT(((A1:L1="X1")*(A2:L2=1))*((A1:L1="X2")*(A2:L2=2)))
Any guidance would be much appreciated.
I haven't checked this thoroughly, but suggest for Ct
=SUMPRODUCT((A$1:J$1="X1")*(A2:J2=1)*(B$1:K$1="X2")*(B2:K2=2))
and for St
=SUMPRODUCT((A$1:J$1="X1")*(A2:J2=1)*(B$1:K$1="X2")*(B2:K2=2)*(C$1:L$1="X3")*C2:L2)
copied down.

Get the values of 4 unknowns from 2 given equations

I have 2 equations in which there are a total of 10 values and out of these 10 values 6 are known and the rest 4 values are unknown. Is there any method in python which could solve this type of problem?
I am talking about the second and fourth equations. Here all the X values are known and in the fourth equation, it is μ12 instead of μ21.
SymPy can tell you what two of the 4 are in terms of the other two (and known values):
>>> from sympy import var, solve
var('i1 x2d k12 x1 x3 mu12 x2 x4 i2 x4d mu21')
(i1, x2d, k12, x1, x3, mu12, x2, x4, i2, x4d, mu21)
>>> solve(
... (i1*x2d-(k12*(x1-x3)+mu12*(x2-x4)),
... i2*x4d-(k12*(x3-x1)+mu21*(x4-x2))),
... (i1,i2,mu12,mu21))
{i1: mu12*(x2 - x4)/x2d + (k12*x1 - k12*x3)/x2d,
i2: mu21*(-x2 + x4)/x4d + (-k12*x1 + k12*x3)/x4d}

Exporting Data in X,Y,Z format

I have a number of Excel files containing coordinate data that is not normalized.
A single row describes multiple points.
Invariably, one column gives the X value. As an illustration some files have say 3 columns representing Z values, and one column representing Y.
X Y1 Y2 Y3 Z
Other files have Multiple Y and Z columns in which point sets may share Y or Z columns.
There may be as many as 40 columns in a table and there are usually about 300 rows.
I need to export the data in CSV format with X, Y, Z values. Is there some relatively simple way to generate a sheet containing X, Y, Z coordinates from such data so that it can be exported?
So that
X Y1 Y2 Y3 Z
becomes
X, Y1, Z ; multiple rows
x, Y2, Z ; multiple rows
X, Y3, Z ; multiple rows
Such that the x and z values are repeated one for each set of Y values.
There is one other format that I have to deal convert. One column contains the X, values, one Row contains the y values, and the z values are in the row and column with the X and Y value.
Complicating all this is that there may be omitted or missing values in some of the cells.
This appears to be the same problem in python
Convert datafiles 'X' 'Y' 'Z' 'data' format
In response to ANDY:
A sheet may be laid out as columns:
X Y1 Y2 Z1 Y3 Y4 Y5 Z2
The coordinate sets are:
X, Y1, Z1
X, Y2, Z1
X, Y3, Z2
X, Y4, Z2

Naive Bayes classifier with intersecting/orthogonal feature sets?

I'm faced with a classification problem that seems to lend itself to Naive Bayes Classifier (NBC). However, I have one problem: usually the NBC works by estimating the most likely Class c out of a set of classes C based on an observation x of a random variable X.
In my case I have multiple variables X1, X2 which might or might not share features. Variable X1 could have features (xa,xb,xc) and X2 might have (xc,xd) and another variable X3 might have (xe). Is it possible to construct one classifier, that allows me to classify X1,X2 and X3 simultaneously, in spite of the fact that the features are intersecting or even orthogonal?
The problem could be viewed from another point of view: for certain classes I am missing all data in certain features. Consider the following tables:
Classes = {C1,C2}.
Features = X = {X1,X2,X3}, X1={A,B}, X2={1,2}, X3={Y,N}
Class C1:
X1 X2 X3 #observations
A 1 ? 50
A 2 ? 20
B 1 ? 20
B 2 ? 10
Class C2:
X1 X2 X3 #observations
A 1 Y 20
A 1 N 0
A 2 Y 20
A 2 N 10
B 1 Y 10
B 1 N 20
B 1 Y 10
B 1 N 10
As you can see the feature X3 does not have any bearing on Class C1. No data is available for feature X3 in classifying class C1. Can I make a classifier that classifies X=(A,2,N) into both C1 and C2? How would I calculate the conditional probabilities for the missing data for X3 in class C1?

Resources