How to create a common matching text string for a set of two text code blocks in the opposite order - excel

For a data set I have two columns, one for the origin code and one for the destination, and I need to combine codes with a matching set in the opposite order into one common set.
I've tried several different concat methods as well as trying to assign a unique value to each combination but can never manage to unpack the code sets that mirror each other.
Example of desired outcome in column D (no particular order preference as long as they're combined and consistent across combinations):

I think a simple COUNTIF should work here:
=IF(COUNTIF(D$1:D1,C2&B2),C2&B2,B2&C2)

Related

How to sort rows in Excel without having repeated data together

I have a table of data with many data repeating.
I have to sort the rows by random, however, without having identical names next to each other, like shown here:
How can I do that in Excel?
Perfect case for a recursive LAMBDA.
In Name Manager, define RandomSort as
=LAMBDA(ζ,
LET(
ξ, SORTBY(ζ, RANDARRAY(ROWS(ζ))),
λ, TAKE(ξ, , 1),
κ, SUMPRODUCT(N(DROP(λ, -1) = DROP(λ, 1))),
IF(κ = 0, ξ, RandomSort(ζ))
)
)
then enter
=RandomSort(A2:B8)
within the worksheet somewhere. Replace A2:B8 - which should be your data excluding the headers - as required.
If no solution is possible then you will receive a #NUM! error. I didn't get round to adding a clause to determine whether a certain combination of names has a solution or not.
This is just an attempt because the question might need clarification or more sample data to understand the actual scenario. The main idea is to generate a random list from the input, then distribute it evenly by names. This ensures no repetition of consecutive names, but this is not the only possible way of sorting (this problem may have multiple valid combinations), but this is a valid one. The solution is volatile (every time Excel recalculates, a new output is generated) because RANDARRAY is volatile function.
In cell D2, you can use the following formula:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m),
idx, SORTBY(seq, RANDARRAY(m,,1,m, TRUE)), rRng, INDEX(rng, idx,{1,2}),
names, INDEX(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Here is the output:
Update
Looking at #JosWoolley approach. The generation of the random sorting can be simplified so that the resulting formula could be:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m), rRng,SORTBY(rng, RANDARRAY(m)),
names, TAKE(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Explanation
LET function is used for easy reading and composition. The name idx represents a random sequence of the input index positions. The name rRng, represents the input rng, but sorted by random. This sorting doesn't ensure consecutive names are distinct.
In order to ensure consecutive names are not repeated, we enumerate (nCnts) repeated names. We use a MAP for that. This is a similar idea provided by #cybernetic.nomad in the comment section, but adapted for an array version (we cannot use COUNTIF because it requires a range). Finally, we use SORTBY with input argument by_array, the map result (nCnts), to ensure names are evenly distributed so no consecutive names will be the same. Every time Excel recalculate you will get an output with the names distributed evenly in a different way.
Not sure if it's worth posting this, but I might as well share the results of my research such as it is. The problem is similar to that of re-arranging the characters in a string so that no same characters are adjacent The method is just to insert whichever one of the remaining characters (names) has the highest frequency at this point and is not the same as the previous character, then reduce its frequency once it has been used. It's fairly easy to implement this in Excel, even in Excel 2019. So if the initial frequencies are in D2:D8 for convenience using Countif:
=COUNTIF(A$2:A$8,A2)
You can use this formula in (say) F2 and pull it down:
=INDEX(A$2:A$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
and similarly in G2 to get the ages:
=INDEX(B$2:B$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
I'm fairly sure this will always produce a correct result if one is possible.
HOWEVER there is no randomness built in to this method. You can see if I extend it to more data that in the first several rows the most common name simply alternates with the other two names:
Having said that, this is a bit of a worst case scenario (a lot of duplication) and it may not look too bad with real data, so it may be worth considering this approach along with the other two methods.

Selection based on several inputs without extreme duplication

I have a library of data that i need to pull specific rows from, at the moment i have an ID made up of several dropdown menus =$C$2&$F$2... that i compare to an index made up of a combination of column content: =[#Column1]&[#Column2]... that i then use to pull the right data for that instance with VLOOKUP.
Now however i need a much more varied set with more selections, 5 columns worth. That creates 16 sets for every index on the first column and will generate thousands of lines if i am to create one version of every permutation.
The best scenario would be a way to use a modular form of the selections above, if there is any input on X, Y and Z then it functions like now, but if Y and Z are empty it only pulls X. Easy in theory but i dont know the format it will have to take, and it gets even more complicated if i want X and Z for instance, or Y and Z, but still create a neat list of the selections.
An alternative might be a way to pull tables based on a selection, and make one table for every "part" of my query but i cant find a way to do that either.
What i need is any way to pull and combine several rows from a library (based on dropdown or similar input) and assembled in a neat list that i can print.
First post, and thanks in advance =)

identify when values in a column change in spotfire

I am trying to create a calculated column that flags/counts the changes in values across rows in another column, in Spotfire. Below is an example of the data types I'm looking at and the desired results.
My hope is that for each Location, and ordered along Time, I can identify when the values of "colors" changes and have running count so that each cluster of similar values between changes is given the same label (Cluster Desire 1) for each Location. It would be best if the running count of clusters can restart at each location but this is not crucial. Any help would be more than appreciated!
I thought of a way to do it, relying on one intermediate column (I used two just to make it a bit clearer).
First: the concatenation of values for each row within its Location: called [concatString]
Concatenate(Concatenate([Color]) over (Intersect([Location],AllPrevious([Time]))),', ')
Spotfire defaults to comma followed by space as a separator: I could not find a way of changing that in this kind of expression.
Then within each [concatString] I remove repeated values. The complication is that the last one did not have the comma+space, and I did not manage to make the regular expression I am using understand that. So my workaround was to add a final comma+space to [concatString]. Hence the extra Concatenate(..).
The formula for the column without repetitions, [consolidatString] is:
RXReplace([concatString],"(\\w+\,\\s)\\1+","$1","g")
Then what we have achieved is an individual value for each line we want to group. We can then simply rank [consolidatString] to achieve the desired column:
DenseRank([consolidatString],[Location])

Can I use MINIFS or INDEX/MATCH on two non-contiguous ranges...?

Problem is straightforward, but solution is escaping. Hopefully some master here can provide insight.
I have a big data grid with prices. Those prices are ordered by location (rows) and business name (cols). I need to match the location/row by looking at two criteria (location name and a second column). Once the matching row is found (there will always be a match), I need to get the minimum/lowest price from two ranges within the grid.
The last point is the real challenge. Unlike a normal INDEX or MINIFS scenario, the columns I need to MIN aren't contiguous... for example, I need to know what the MIN value is between I4:J1331 and Q4:U1331. It's not an intersection, it's a contiguous set of values across two different arrays.
You're probably saying "hey, why don't you just reorder your table to make them contiguous"... not an option. I have a lot of data, and this spreadsheet is used for a bunch of other stuff. So, I have to work with the format I have, and that means figuring out how to do a lookup/min across multiple non-contiguous ranges. My latest attempt:
=MINIFS(AND($I$4:$J$1331,$K$4:$P$1331),$B$4:$B$1331,$A2,$E$4:$E$1331,$B2)
Didn't work, but it should make it more clear what I'm trying to do. There has GOT to be an easy way to just tell excel "use these two ranges instead of one".
Thanks,
Rick
Figured it out. For anyone else who's interested, there doesn't seem to be any easy way to just "AND" arrays together for a search (Hello MS, backlog please). So, what I did instead was to just create multiple INDEX/MATCH arrays inside of a MIN function and take the result. Like this:
MIN((INDEX/MATCH ARRAY 1),(INDEX/MATCH ARRAY 2))
They both have identical criteria, the only difference is the set of arrays being indexed in each function. That basically gives me this:
MIN((match array),(match array))
And Min can then pull the lowest value from either.
Not as elegant as I'd like... lots of redundant code, but at least it works.
-rt

Data validation without named ranges

I have the following example list:
Note: In my real list I have around 200 options and 400 suboptions
And I would like to have 2 dropdownlists to select any option and it's suboptions:
For options, I used data validation - list with range =$A$8:$A$12
And for suboptions I tried the following:
Named ranges
It works but it needs a lot of manual work to maintain as the suboptions list is updated kind of frequently and AFAIK I would need to create and maintain many named ranges as many options I have.
Example
Named Range: _ABC05
Refers To: =Sheet1!$D$9:$D$10
Data validation: = INDIRECT(CONCATENATE("_";SUBSTITUTE(A2;"-";"")))
Again, this works but I am trying to avoid to maintain 200 named ranges.
Any solution without using named ranges or vba?
Finally I solve it using dynamic data validation:
In hidden column D, I have the following formula:
=CONCATENATE("D";MATCH(A2;$C$8:$C$15;0)+7;":D";MATCH(A2;$C$8:$C$15;1)+7)
And the data validation like this:
=INDIRECT(D2)
Edit: As mentioned by Aprillion, this will only work if the options list is sorted alphabetically ascending. In my case it is always like that but it would be interesting to know another solution with unsorted data. Also, In this example it is possible to avoid the hidden column and use =indirect(concatenate... in the data validation but in my case I have the lists in a separate worksheet and it is not possible to reference a data validation list in external workbooks or worksheets.
One issue is that once the user select an option and corresponding suboption, and then change again the option, the suboption is still selected even if it is not mapped to the new option. I found one solutions that consists in using a fake list as source of the data validation when C2 already have a value:
=IF(C2="";INDIRECT("$A$8:$A$12");INDIRECT("FakeList"))

Resources