How to pick certain amount of random excel cells with text and combine them in a string - excel

I have a big list with about ~300 hashtags where I want to pick a certain amount (like 10 hashtags) randomly. Is it possible without VBA?

Here's one way of doing it, though it relies on your Excel being recent enough to support these functions:
LET
UNIQUE
RANDARRAY
SEQUENCE
With this formula:
=TEXTJOIN(" ",,LET(rng,$B$3:$B$13,n,ROWS(rng),idx,UNIQUE(RANDARRAY(n*n,1,1,n,TRUE)),s,SEQUENCE($E$1),INDEX(rng,INDEX(idx,s))))
It's a bit lengthy, but works around the observation that RANDARRAY() can return duplicate values by getting many more random numbers than are needed, and then taking the first x values.
NB. If the number of tags you want is small compared to the total available, then you probably don't need the (n*n) and can just use (n).
Hat-tips to #SpencerBarnes and #MayukhBhattacharya

on Office 365:
=CONCAT(INDEX(A1:A300, RANDARRAY(10,0, 1, 300, TRUE)))
Where A1:A300 is your list of hashtags, and 1, 300, is the start and end of the list.
Removing the CONCAT() function from the outside of the formula will output a spilled range rather than a concatenated string.

Related

How to sort rows in Excel without having repeated data together

I have a table of data with many data repeating.
I have to sort the rows by random, however, without having identical names next to each other, like shown here:
How can I do that in Excel?
Perfect case for a recursive LAMBDA.
In Name Manager, define RandomSort as
=LAMBDA(ζ,
LET(
ξ, SORTBY(ζ, RANDARRAY(ROWS(ζ))),
λ, TAKE(ξ, , 1),
κ, SUMPRODUCT(N(DROP(λ, -1) = DROP(λ, 1))),
IF(κ = 0, ξ, RandomSort(ζ))
)
)
then enter
=RandomSort(A2:B8)
within the worksheet somewhere. Replace A2:B8 - which should be your data excluding the headers - as required.
If no solution is possible then you will receive a #NUM! error. I didn't get round to adding a clause to determine whether a certain combination of names has a solution or not.
This is just an attempt because the question might need clarification or more sample data to understand the actual scenario. The main idea is to generate a random list from the input, then distribute it evenly by names. This ensures no repetition of consecutive names, but this is not the only possible way of sorting (this problem may have multiple valid combinations), but this is a valid one. The solution is volatile (every time Excel recalculates, a new output is generated) because RANDARRAY is volatile function.
In cell D2, you can use the following formula:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m),
idx, SORTBY(seq, RANDARRAY(m,,1,m, TRUE)), rRng, INDEX(rng, idx,{1,2}),
names, INDEX(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Here is the output:
Update
Looking at #JosWoolley approach. The generation of the random sorting can be simplified so that the resulting formula could be:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m), rRng,SORTBY(rng, RANDARRAY(m)),
names, TAKE(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Explanation
LET function is used for easy reading and composition. The name idx represents a random sequence of the input index positions. The name rRng, represents the input rng, but sorted by random. This sorting doesn't ensure consecutive names are distinct.
In order to ensure consecutive names are not repeated, we enumerate (nCnts) repeated names. We use a MAP for that. This is a similar idea provided by #cybernetic.nomad in the comment section, but adapted for an array version (we cannot use COUNTIF because it requires a range). Finally, we use SORTBY with input argument by_array, the map result (nCnts), to ensure names are evenly distributed so no consecutive names will be the same. Every time Excel recalculate you will get an output with the names distributed evenly in a different way.
Not sure if it's worth posting this, but I might as well share the results of my research such as it is. The problem is similar to that of re-arranging the characters in a string so that no same characters are adjacent The method is just to insert whichever one of the remaining characters (names) has the highest frequency at this point and is not the same as the previous character, then reduce its frequency once it has been used. It's fairly easy to implement this in Excel, even in Excel 2019. So if the initial frequencies are in D2:D8 for convenience using Countif:
=COUNTIF(A$2:A$8,A2)
You can use this formula in (say) F2 and pull it down:
=INDEX(A$2:A$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
and similarly in G2 to get the ages:
=INDEX(B$2:B$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
I'm fairly sure this will always produce a correct result if one is possible.
HOWEVER there is no randomness built in to this method. You can see if I extend it to more data that in the first several rows the most common name simply alternates with the other two names:
Having said that, this is a bit of a worst case scenario (a lot of duplication) and it may not look too bad with real data, so it may be worth considering this approach along with the other two methods.

Excel - Combine data from multiple tables dynamically

I would like to combine three different tables in Excel. I am struggling with the fact that the tables can vary in length.
For example:
What I would like to achieve is all the tables' data in one table without empty spaces. So first the two entries from the first table then the three entries from the second table and lastly the entry from the third table. But the amount of rows in each table can vary.
How can I do this dynamically so when the amount of entries in the tables change it can handle this? I'm using Mac with Office365. Thanks!
EDIT:
Output with Ron Rosenfeld's solution, the range of the list goes down from cell 5 - cell 103. Could this be reduced to 5 - 15?:
If you have Excel 2019 or Office 365, with the FILTERXML and TEXTJOIN functions, you can use:
=FILTERXML("<t><s>" & TEXTJOIN("</s><s>",TRUE,Table1,Table2, Table3) & "</s></t>","//s[.!=0]")
If those zero's are really blanks, you can omit [.!=0] from the xPath argument, but it won't hurt to leave it there
Edit:
With MAC versions of Office 365 that do not have the FILTERXML function, I believe the following will work:
=LET(
a,299,
x,IF(SEQUENCE(99,,0)=0,1,SEQUENCE(99,,0)*a),
y,TEXTJOIN(REPT(" ",a),TRUE,Table19,Table20,Table21),
z, TRIM(MID(y,x,a)),FILTER(z,(z<>"0")*(z<>""))
)
Note the a parameter in the above function
Because of how the splitting algorithm works, the sequence for each cell will not always start at the beginning of a string.
Hence, if there are enough letters in the various strings, the start number may eventually get offset enough to cause a split in the wrong location
One fix is to use an arbitrarily large number of space's to insert.
99 is frequently large enough, but not for this data set.
299 seems to be large enough for the data set as shown in your actual data.
I believe the minimum number should be the sum of the lengths of all the characters in the original tables (including the 0's) plus one (1). But not sure of this.
You can certainly adjust it as needed
If the number becomes too large, you could run into the 32,767 character limitation. If that happened, an error message would occur.
So, if you wanted to compute a, dynamically, you could try something like:
=LET(
a,SUM(LEN(Table19[Column1]),LEN(Table20[Column1]),LEN(Table21[Column1]))+1,
x,IF(SEQUENCE(99,,0)=0,1,SEQUENCE(99,,0)*a),
y,TEXTJOIN(REPT(" ",a),TRUE,Table19,Table20,Table21),
z, TRIM(MID(y,x,a)),FILTER(z,(z<>"0")*(z<>""))
)
but no guarantees.
Assuming the data is in A:C, and empty cell is blank (not 0).
In E1 put :
=IF(ROW()>COUNTA(A:C),"",
INDEX(A:C,
IF(ROW()<=COUNTA(A:A),ROW(),IF(ROW()<=COUNTA(A:B),ROW()-COUNTA(A:A),ROW()-COUNTA(A:B))),
IF(ROW()<=COUNTA(A:A),1,IF(ROW()<=COUNTA(A:B),2,3)))
)
Idea : use row() to guide in selection in index. counta() is used guide converting 'row()' to usable index numbers. Also make the output cell blank "" for row() > counta(a:c).
Please share if it works/not.

Excel multilevel array formula with partial string matches to sum resultant cells

I've been trying to sort this for over a day now without much luck. I have successfully used SUMIFS, INDEX, MATCH, COUNTIF, "--" etc array functions previously and am not a novice, but also not an expert on these. I can't seem to weave these together correctly, and likely on an altogether incorrect path.
Basically, I am trying to aggregate data from multiple spreadsheets, requiring a mapping of various items (rows) into a canonical form for summing.
The image here shows a representative, but simplified version of my quest. Each "region" on this example spreadsheet (Final..., Mapping, DataSet1, DataSet2) is actually in different spreadsheets, and there are several sheets with 50-150 rows in each xlsx.
Note that the names in Column B are quite arbitrary (meaning not all P1's have an 'x' pattern, like shown here as x1, x2, etc. Do not rely on any pattern in the names, except the x, y , z in the Mapping table are substrings (case insensitive, trailing match) of the names in Column B in the DataSets.
And in the image, the Final Result Table (summed manually) is what I want to compute via(an array) formula: A single formula would be ideal (given I have many spreadsheets from which the monthly data is being pulled from, so I can't readily modify but can create an interim spreadsheet if required, so open to helper columns or helper rows).
Here's the process - For each name (B3-B5) in the Final Result Table, I want to sum the name from it's components as follows:
Lookup all the matches in the Mapping Table (so for P1, the formula =IF($C$10:$C$15=$B3, $B$10:$B$15,"") gives {"x1";"";"";"x2";"";"x3"}.
I then want to search each of x1, x2, and x3 in B19:B26 to get rows 21, 22, 24, 25, 26 in DataSet1 and B31:B35 to get row 32 in DataSet2, to then add up the Jan totals into C3. (Effectively,
C3=C21+C22+C24+C25+C26+C32). Same for P2 and P3, and thru Feb, Mar, ...
I am stuck on how to remove blank or 0 or Div0 or such "error rows" from the interim result in 2, and also need to use 2 arrays of different sizes (3 valid rows in example 2 above, ignoring blanks) to search many rows in DataSets. I tried SEARCH("*"&IF($C$10:$C$15=$B3, $B$10:$B$15,""), $B$19:$B$26) but get unexpected results. I have tried to replace text in the interim result {"x1";"";"";"x2";"";"x3"} with TRUE/FALSE, and 1/0, etc. to help with INDEX or MATCH, but am stymied by errors in downstream ("surrounding") formulas.
Thanks in advance.
Here is a solution without resorting to nasty (imo) CSE formulas.
= SUMPRODUCT($C$19:$F$26*(COUNTIFS($B$10:$B$15, RIGHT($B$19:$B$26,2),$C$10:$C$15,$B3)>0)*($C$18:$F$18=C$2))
+
SUMPRODUCT($C$31:$F$35*(COUNTIFS($B$10:$B$15, RIGHT($B$31:$B$35,2),$C$10:$C$15,$B3)>0)*($C$30:$F$30=C$2))
There is one SUMPRODUCT for each data set. If possible, it would be better to put all your data sets into a single table with a column identify which data set it is a part of.
The way it works is to takes each values in your data set and multiplies it by whether the 2 right most character appear in your mapping table for that P code, multiplied by whether the value is in the correct month. So it returns 0 if either of those conditions are false. Then returns the sum.
UPDATE IN RESPONSE TO OP COMMENTS
If, the X,Y, Z codes are not always 2 digits but the first part is ALWAYS 8 digits, you can easily amend the:
RIGHT($B$19:$B$26,2)
to be:
RIGHT($B$19:$B$26,LEN($B$19:$B$26)-8)
Making the formula for the first data set:
=SUMPRODUCT($C$19:$F$26*(COUNTIFS($B$10:$B$15, RIGHT($B$19:$B$26,LEN($B$19:$B$26)-8),$C$10:$C$15,$B3)>0)*($C$18:$F$18=C$2))
And you can amend for other data sets and simply add them together.
Nice challenge! Are you willing to drop all your tables (DataSet1, DataSet2...) into one spreadsheet, so that we can refer just one single range for each month?
Here's one solution (hopefully a good starting point) - array formula (Ctrl+Shift+Enter):
=SUMPRODUCT(IFERROR(IF(TRANSPOSE(IF($B3=$C$10:$C$15,$B$10:$B$15,""))=RIGHT($B$18:$B$36,2),C$18:C$36,0),0))

Iterative/Looped Substitute without VBA

Abridged Question:
If I have a concatenated string of "|#|#|#|...|#|", how can I apply a multiplier to each of the numbers and update the concatenated text? For example, for |4|12|8|, multiply by a factor of 2 and update the concatenated text to |8|24|16|.
Background
I have three columns of interest. The first column contains a date, the second an amount or factor, and the third column concatenates data into the format "|#|#|...|#|" (e.g., |2|5|, |2|5|12|, |4|12|, etc.). At times, a multiplying factor needs to be applied to the concatenated data, and the individual numbers would need to be updated accordingly.
An example would be—
Date Amt Concatenated Data
01/01/18 2 |2|
01/05/18 5 |2|5|
02/06/18 12 |2|5|12|
03/25/18 -3 |4|12|
03/31/18 8 |4|12|8|
04/01/18 F2 |8|24|16| (factor of 2 applied)
04/15/18 12 |8|24|16|12|
04/01/18 F1/4 |2|6|4|3| (factor of 1/4 applied)
With a formula, how can I apply the factor to the concatenated data, and update the individual numbers?
I'm bound by the following conditions:
Excel 2007, so no TEXTJOIN function
No VBA or UDFs (due to security policies)
Individual numbers are dynamic (i.e., I can't use a static value for the "old_text" parameter of the SUBSTITUTE formula)
Amount of individual numbers within concatenated data is also dynamic (may contain one number, or may contain dozens of different numbers)
I can pull out the individual numbers using an array formula. I can even then multiply those numbers by the factor to produce an array result. However, I can't rebuild the concatenated data, because CONCATENATE doesn't work on an array. I've also tried SUBSTITUTE, but I can't iterate through the "|" separators. I can only substitute a given segment (e.g., change all entries of "|2|" to "|4|"). Nesting SUBSTITUTE or using individual columns won't work, since it could potentially involve dozens of instances.
Just to add some info on the concatenated data:
Amt>0, then value is concatenated to the end of the previous concatenated value
Amt<0, begin reducing individual numbers in concatenated value (CV) until reduction amount reached (e.g., for |2|5|12| and Amt=-3, reduce CV to |4|12|, which is -2 from the first segment and -1 from the second segment)
Amt reduction is limited to the sum of the previous CV's individual numbers (e.g., for |4|12|, the reduction cannot exceed 16)
Amt=F#, indicates a multiplying factor, and the CV's numbers need to be updated
The CV has no max (could have dozens to hundreds of individual numbers, with numbers going from 1 to 100,000+), other than any max applied by Excel itself on string length
HIGH LEVEL
Four parts to this solution
They satisfy pre-requisites (2007 compatibility, no VB, no Office 365 requirement, no custom VB functions, provide for complete 'dynamic' nature of variable length of cells to concatenate)
Caveat: to best knowledge / research, there is no parsimonious single-cell function & therefore an interim step has been proposed)
One more caveat: I imagine the simple 'hack' of wrapping a graph around the delimited data is out of question (see 'Other/Various' below ☺)
PARTS 1-4
Accompanying parts 1-4 below are functions which relate to the following screenshot:
I have also uploaded / amended to meet requirements of Google Sheets (see here)
Parts 1 & 2:
Similar in that they rely upon FilterXML technique to count component / terms, and split cells respectively:
Part 1:
=COUNT(2*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num")))
Note: google sheets doesn't recognise FilterXML, so have amended technique/functions accordingly. For instance, above can be determined using counta on the split cells in Part 2 (easier / much more simpler than proposed approach above, albeit less robust given any cells lying to the right of the split cells will interfere with ordinary functionality of this approach).
Part 2:
It's either a manual approach, a fancy series of 'mid' &/or substitute / left/right functions, or the following FilterXML code which, per various sources (e.g. here) should be compatible with Excel 2007:
=IF(LEFT(C12,1)="F",1*SUBSTITUTE(C12,"F",""),1)*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num"))
Commonality with Part 1 (re: FilterXML) can be seen - the only difference is that the count(Part 1) has been replaced with the transformation (multiplicative factor, as given in O.P's Q).
Part 3
Nothing fancy here - a simple concatenation (which is a far cry from a 'recursive' substitution function, I know, but hey - it does the trick and can always be placed in a mirror copy of the original sheet to avoid space issues/cell interaction issues)
=IF(H12="","",IF(G24="","|","")&G24&H12&"|")
Part 4
Thanks to the number of terms derived in Part 1, an offset function can easily determine the final cell pertaining to the concatenated 'build up' of 'transformed' values (per Part 3):
=OFFSET(H31,0,E31-1,1,1)
OTHER / VARIOUS
Various other proposals and 'workarounds' exist; unfortunately, these appear to fall short in one way or another of the pre-requisites set forth, videlicit:
a) Function/formula based
b) No VB
c) Excel 2007
d) Dynamic (variable/unknown number of terms)
Manual: e.g. function = concatenate(transpose(desired range)), and then components of the concatenate function and pressing F9 to convert to calculated values, which are readily applicable in the concatenate function. Disadvantage: time consuming in relation to 'automated solution' (needs to be done for each applicable toy). Advantage: no additional 'spreadsheet real estate' required, quicker/straightforward implementation in first instance.
Variants of the 'build-up' method: e.g. per Part 3, however, this alone does not ensure for an automated approach across an unknown number of terms in the original concatenated list.
Have mentioned in a previous solution (here), but may be case that you are eligible for Office 365 functionality whilst on a previous version of Excel (see Office Insider here)
Other proposals (above/below in this forum mind you) propose textjoin (so not sure if this is a comprehension issue or what ☺)
And yes, as alluded to at outset, you can easily achieve the desired outcome using a simple graph! Just for fun then, sort the data in reverse order, and include the split/delimited values as a bar graphs' "x-values" (which, by defn. for this type of graph, will now appear along the ordinary Cartesian 'y/vertical' axis)...
Zero points for this but thought it was an interesting discovery on my part!
(and if still in doubt, here's what the 'graph' would look like if I didn't kill everything except for the axis labels...):
Numerous references for relevant other items above, including research areas, as follows:
Excel Champs
StackOverflow - alternative applications for FilterXML
JUST ONE MORE THING...
In true Columbo style modus operandi, other ideas/approaches considered:
Application of pivot table?
Constructing matrices: I got a solution with a series of offset functions, but couldn't think of a feasible way to implement given space issues
Converting the split cells into a long digit through summation: e.g. 8 22 16 = 80 000 + 22 00 + 16. Using a substitute function with text (long digit, "General") I was able to successfully introduce the delimiter character ('|') for pairs of adjacent 'tuples' (e.g. I could get '8|2216', '822|16', but then a 'build up' formula where one cell depends upon converted values of the previous and so one was required once more, which landed me back to the proposal I have set out above
fyi - the matrix consideration only solves tuples of 2, for n-dimensional /combination one would need to 'pass' a string of characters over its mirror copy - e.g. {6,10,22} would pass over {6,10,22}, ignoring duplicate values would yield a trapezium as follows:
6
10
22
6
10
22
6
10
22
after the copy has 'passed' over the original (first row), we have the desired combination (22,10,6) (on the 'diagonal' such a matrix). This is akin to how Fourier Transforms work (kind of); but that aside, it was tempting to construct a matrix like this, but couldn't be bothered at this stage.
Will probably turn out to be a far simpler way that someone comes up with (I won't be the only person surprised based upon the various sources I've considered...)

Find location of NEW number in list

I'm looking for a way to find where a RAND() number lies on a list of irrational (effectively RAND() as well) numbers from 0 to 1.
So I have a list of numbers 0.1003, 0.1984, 0.3895, 0.4506, 0.4724, 0.4856, 0.5602, 0.8542 in A1:A8
Then I have a RAND() number to check against the list. Now I've tried RANK.AV(RAND(),A1:A8), but the rank functions require your lookup value to be in the list.
A simple solution would be to place my RAND() at the bottom of the list (A9) and use RANK.AV(A9,A1:A9), so my number is included on the list, however I would have to do this for every number in my array of thousands of rand numbers, so impractical.
Perhaps there is some way I can join another cell onto an array without actually placing it adjacent?
Eg, for a RAND() in B1, I could write in C1:
=RANK.AV(B1,ARRAY.JOIN(A1:A8,B1)), but I've tried a few ways (&,+) and can't achieve this array joining function, so I thought I'd ask for help! Perhaps a macro or UDF is required?
Sorry #Chris, I'm sure this is what you meant in your comment:-
=IF(B1<A1,1,MATCH(B1,$A$1:$A$10)+1)
where your new random number is in B1 and your existing list in A1:A10, sorted in ascending order. Where the new number is less than the first entry in the list, this would be a special case: otherwise, you could allow for it by placing a zero in A1 and moving the pseudo-random numbers down which would simplify the formula to
=MATCH(B1,A$1:A$10)
If you wanted to allow for ties, you could correct for it:-
=IF(B1<A1,1,MATCH(B1,$A$1:$A$10)+1)-COUNTIF($A$1:$A$10,B1)/2
or just
=MATCH(B1,$A$1:$A$10)-COUNTIF($A$1:$A$10,B1)/2
with a zero in A1.
I'm assuming that ranking is from 1=smallest to 8=largest if there are 8 numbers: you can easily change it by subtracting the rank from count(A$1:A$10)+2 or count(A$1:A$10)+1 if the zero is included.
This could do your trick:
=RANK.AVG(INDEX(A1:A8,RANDBETWEEN(1,COUNT(A1:A8))),A1:A8)
This ranks a random choice out of random numbers.

Resources