Excel - Combine data from multiple tables dynamically - excel

I would like to combine three different tables in Excel. I am struggling with the fact that the tables can vary in length.
For example:
What I would like to achieve is all the tables' data in one table without empty spaces. So first the two entries from the first table then the three entries from the second table and lastly the entry from the third table. But the amount of rows in each table can vary.
How can I do this dynamically so when the amount of entries in the tables change it can handle this? I'm using Mac with Office365. Thanks!
EDIT:
Output with Ron Rosenfeld's solution, the range of the list goes down from cell 5 - cell 103. Could this be reduced to 5 - 15?:

If you have Excel 2019 or Office 365, with the FILTERXML and TEXTJOIN functions, you can use:
=FILTERXML("<t><s>" & TEXTJOIN("</s><s>",TRUE,Table1,Table2, Table3) & "</s></t>","//s[.!=0]")
If those zero's are really blanks, you can omit [.!=0] from the xPath argument, but it won't hurt to leave it there
Edit:
With MAC versions of Office 365 that do not have the FILTERXML function, I believe the following will work:
=LET(
a,299,
x,IF(SEQUENCE(99,,0)=0,1,SEQUENCE(99,,0)*a),
y,TEXTJOIN(REPT(" ",a),TRUE,Table19,Table20,Table21),
z, TRIM(MID(y,x,a)),FILTER(z,(z<>"0")*(z<>""))
)
Note the a parameter in the above function
Because of how the splitting algorithm works, the sequence for each cell will not always start at the beginning of a string.
Hence, if there are enough letters in the various strings, the start number may eventually get offset enough to cause a split in the wrong location
One fix is to use an arbitrarily large number of space's to insert.
99 is frequently large enough, but not for this data set.
299 seems to be large enough for the data set as shown in your actual data.
I believe the minimum number should be the sum of the lengths of all the characters in the original tables (including the 0's) plus one (1). But not sure of this.
You can certainly adjust it as needed
If the number becomes too large, you could run into the 32,767 character limitation. If that happened, an error message would occur.
So, if you wanted to compute a, dynamically, you could try something like:
=LET(
a,SUM(LEN(Table19[Column1]),LEN(Table20[Column1]),LEN(Table21[Column1]))+1,
x,IF(SEQUENCE(99,,0)=0,1,SEQUENCE(99,,0)*a),
y,TEXTJOIN(REPT(" ",a),TRUE,Table19,Table20,Table21),
z, TRIM(MID(y,x,a)),FILTER(z,(z<>"0")*(z<>""))
)
but no guarantees.

Assuming the data is in A:C, and empty cell is blank (not 0).
In E1 put :
=IF(ROW()>COUNTA(A:C),"",
INDEX(A:C,
IF(ROW()<=COUNTA(A:A),ROW(),IF(ROW()<=COUNTA(A:B),ROW()-COUNTA(A:A),ROW()-COUNTA(A:B))),
IF(ROW()<=COUNTA(A:A),1,IF(ROW()<=COUNTA(A:B),2,3)))
)
Idea : use row() to guide in selection in index. counta() is used guide converting 'row()' to usable index numbers. Also make the output cell blank "" for row() > counta(a:c).
Please share if it works/not.

Related

How to pick certain amount of random excel cells with text and combine them in a string

I have a big list with about ~300 hashtags where I want to pick a certain amount (like 10 hashtags) randomly. Is it possible without VBA?
Here's one way of doing it, though it relies on your Excel being recent enough to support these functions:
LET
UNIQUE
RANDARRAY
SEQUENCE
With this formula:
=TEXTJOIN(" ",,LET(rng,$B$3:$B$13,n,ROWS(rng),idx,UNIQUE(RANDARRAY(n*n,1,1,n,TRUE)),s,SEQUENCE($E$1),INDEX(rng,INDEX(idx,s))))
It's a bit lengthy, but works around the observation that RANDARRAY() can return duplicate values by getting many more random numbers than are needed, and then taking the first x values.
NB. If the number of tags you want is small compared to the total available, then you probably don't need the (n*n) and can just use (n).
Hat-tips to #SpencerBarnes and #MayukhBhattacharya
on Office 365:
=CONCAT(INDEX(A1:A300, RANDARRAY(10,0, 1, 300, TRUE)))
Where A1:A300 is your list of hashtags, and 1, 300, is the start and end of the list.
Removing the CONCAT() function from the outside of the formula will output a spilled range rather than a concatenated string.

Excel multilevel array formula with partial string matches to sum resultant cells

I've been trying to sort this for over a day now without much luck. I have successfully used SUMIFS, INDEX, MATCH, COUNTIF, "--" etc array functions previously and am not a novice, but also not an expert on these. I can't seem to weave these together correctly, and likely on an altogether incorrect path.
Basically, I am trying to aggregate data from multiple spreadsheets, requiring a mapping of various items (rows) into a canonical form for summing.
The image here shows a representative, but simplified version of my quest. Each "region" on this example spreadsheet (Final..., Mapping, DataSet1, DataSet2) is actually in different spreadsheets, and there are several sheets with 50-150 rows in each xlsx.
Note that the names in Column B are quite arbitrary (meaning not all P1's have an 'x' pattern, like shown here as x1, x2, etc. Do not rely on any pattern in the names, except the x, y , z in the Mapping table are substrings (case insensitive, trailing match) of the names in Column B in the DataSets.
And in the image, the Final Result Table (summed manually) is what I want to compute via(an array) formula: A single formula would be ideal (given I have many spreadsheets from which the monthly data is being pulled from, so I can't readily modify but can create an interim spreadsheet if required, so open to helper columns or helper rows).
Here's the process - For each name (B3-B5) in the Final Result Table, I want to sum the name from it's components as follows:
Lookup all the matches in the Mapping Table (so for P1, the formula =IF($C$10:$C$15=$B3, $B$10:$B$15,"") gives {"x1";"";"";"x2";"";"x3"}.
I then want to search each of x1, x2, and x3 in B19:B26 to get rows 21, 22, 24, 25, 26 in DataSet1 and B31:B35 to get row 32 in DataSet2, to then add up the Jan totals into C3. (Effectively,
C3=C21+C22+C24+C25+C26+C32). Same for P2 and P3, and thru Feb, Mar, ...
I am stuck on how to remove blank or 0 or Div0 or such "error rows" from the interim result in 2, and also need to use 2 arrays of different sizes (3 valid rows in example 2 above, ignoring blanks) to search many rows in DataSets. I tried SEARCH("*"&IF($C$10:$C$15=$B3, $B$10:$B$15,""), $B$19:$B$26) but get unexpected results. I have tried to replace text in the interim result {"x1";"";"";"x2";"";"x3"} with TRUE/FALSE, and 1/0, etc. to help with INDEX or MATCH, but am stymied by errors in downstream ("surrounding") formulas.
Thanks in advance.
Here is a solution without resorting to nasty (imo) CSE formulas.
= SUMPRODUCT($C$19:$F$26*(COUNTIFS($B$10:$B$15, RIGHT($B$19:$B$26,2),$C$10:$C$15,$B3)>0)*($C$18:$F$18=C$2))
+
SUMPRODUCT($C$31:$F$35*(COUNTIFS($B$10:$B$15, RIGHT($B$31:$B$35,2),$C$10:$C$15,$B3)>0)*($C$30:$F$30=C$2))
There is one SUMPRODUCT for each data set. If possible, it would be better to put all your data sets into a single table with a column identify which data set it is a part of.
The way it works is to takes each values in your data set and multiplies it by whether the 2 right most character appear in your mapping table for that P code, multiplied by whether the value is in the correct month. So it returns 0 if either of those conditions are false. Then returns the sum.
UPDATE IN RESPONSE TO OP COMMENTS
If, the X,Y, Z codes are not always 2 digits but the first part is ALWAYS 8 digits, you can easily amend the:
RIGHT($B$19:$B$26,2)
to be:
RIGHT($B$19:$B$26,LEN($B$19:$B$26)-8)
Making the formula for the first data set:
=SUMPRODUCT($C$19:$F$26*(COUNTIFS($B$10:$B$15, RIGHT($B$19:$B$26,LEN($B$19:$B$26)-8),$C$10:$C$15,$B3)>0)*($C$18:$F$18=C$2))
And you can amend for other data sets and simply add them together.
Nice challenge! Are you willing to drop all your tables (DataSet1, DataSet2...) into one spreadsheet, so that we can refer just one single range for each month?
Here's one solution (hopefully a good starting point) - array formula (Ctrl+Shift+Enter):
=SUMPRODUCT(IFERROR(IF(TRANSPOSE(IF($B3=$C$10:$C$15,$B$10:$B$15,""))=RIGHT($B$18:$B$36,2),C$18:C$36,0),0))

Iterative/Looped Substitute without VBA

Abridged Question:
If I have a concatenated string of "|#|#|#|...|#|", how can I apply a multiplier to each of the numbers and update the concatenated text? For example, for |4|12|8|, multiply by a factor of 2 and update the concatenated text to |8|24|16|.
Background
I have three columns of interest. The first column contains a date, the second an amount or factor, and the third column concatenates data into the format "|#|#|...|#|" (e.g., |2|5|, |2|5|12|, |4|12|, etc.). At times, a multiplying factor needs to be applied to the concatenated data, and the individual numbers would need to be updated accordingly.
An example would be—
Date Amt Concatenated Data
01/01/18 2 |2|
01/05/18 5 |2|5|
02/06/18 12 |2|5|12|
03/25/18 -3 |4|12|
03/31/18 8 |4|12|8|
04/01/18 F2 |8|24|16| (factor of 2 applied)
04/15/18 12 |8|24|16|12|
04/01/18 F1/4 |2|6|4|3| (factor of 1/4 applied)
With a formula, how can I apply the factor to the concatenated data, and update the individual numbers?
I'm bound by the following conditions:
Excel 2007, so no TEXTJOIN function
No VBA or UDFs (due to security policies)
Individual numbers are dynamic (i.e., I can't use a static value for the "old_text" parameter of the SUBSTITUTE formula)
Amount of individual numbers within concatenated data is also dynamic (may contain one number, or may contain dozens of different numbers)
I can pull out the individual numbers using an array formula. I can even then multiply those numbers by the factor to produce an array result. However, I can't rebuild the concatenated data, because CONCATENATE doesn't work on an array. I've also tried SUBSTITUTE, but I can't iterate through the "|" separators. I can only substitute a given segment (e.g., change all entries of "|2|" to "|4|"). Nesting SUBSTITUTE or using individual columns won't work, since it could potentially involve dozens of instances.
Just to add some info on the concatenated data:
Amt>0, then value is concatenated to the end of the previous concatenated value
Amt<0, begin reducing individual numbers in concatenated value (CV) until reduction amount reached (e.g., for |2|5|12| and Amt=-3, reduce CV to |4|12|, which is -2 from the first segment and -1 from the second segment)
Amt reduction is limited to the sum of the previous CV's individual numbers (e.g., for |4|12|, the reduction cannot exceed 16)
Amt=F#, indicates a multiplying factor, and the CV's numbers need to be updated
The CV has no max (could have dozens to hundreds of individual numbers, with numbers going from 1 to 100,000+), other than any max applied by Excel itself on string length
HIGH LEVEL
Four parts to this solution
They satisfy pre-requisites (2007 compatibility, no VB, no Office 365 requirement, no custom VB functions, provide for complete 'dynamic' nature of variable length of cells to concatenate)
Caveat: to best knowledge / research, there is no parsimonious single-cell function & therefore an interim step has been proposed)
One more caveat: I imagine the simple 'hack' of wrapping a graph around the delimited data is out of question (see 'Other/Various' below ☺)
PARTS 1-4
Accompanying parts 1-4 below are functions which relate to the following screenshot:
I have also uploaded / amended to meet requirements of Google Sheets (see here)
Parts 1 & 2:
Similar in that they rely upon FilterXML technique to count component / terms, and split cells respectively:
Part 1:
=COUNT(2*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num")))
Note: google sheets doesn't recognise FilterXML, so have amended technique/functions accordingly. For instance, above can be determined using counta on the split cells in Part 2 (easier / much more simpler than proposed approach above, albeit less robust given any cells lying to the right of the split cells will interfere with ordinary functionality of this approach).
Part 2:
It's either a manual approach, a fancy series of 'mid' &/or substitute / left/right functions, or the following FilterXML code which, per various sources (e.g. here) should be compatible with Excel 2007:
=IF(LEFT(C12,1)="F",1*SUBSTITUTE(C12,"F",""),1)*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num"))
Commonality with Part 1 (re: FilterXML) can be seen - the only difference is that the count(Part 1) has been replaced with the transformation (multiplicative factor, as given in O.P's Q).
Part 3
Nothing fancy here - a simple concatenation (which is a far cry from a 'recursive' substitution function, I know, but hey - it does the trick and can always be placed in a mirror copy of the original sheet to avoid space issues/cell interaction issues)
=IF(H12="","",IF(G24="","|","")&G24&H12&"|")
Part 4
Thanks to the number of terms derived in Part 1, an offset function can easily determine the final cell pertaining to the concatenated 'build up' of 'transformed' values (per Part 3):
=OFFSET(H31,0,E31-1,1,1)
OTHER / VARIOUS
Various other proposals and 'workarounds' exist; unfortunately, these appear to fall short in one way or another of the pre-requisites set forth, videlicit:
a) Function/formula based
b) No VB
c) Excel 2007
d) Dynamic (variable/unknown number of terms)
Manual: e.g. function = concatenate(transpose(desired range)), and then components of the concatenate function and pressing F9 to convert to calculated values, which are readily applicable in the concatenate function. Disadvantage: time consuming in relation to 'automated solution' (needs to be done for each applicable toy). Advantage: no additional 'spreadsheet real estate' required, quicker/straightforward implementation in first instance.
Variants of the 'build-up' method: e.g. per Part 3, however, this alone does not ensure for an automated approach across an unknown number of terms in the original concatenated list.
Have mentioned in a previous solution (here), but may be case that you are eligible for Office 365 functionality whilst on a previous version of Excel (see Office Insider here)
Other proposals (above/below in this forum mind you) propose textjoin (so not sure if this is a comprehension issue or what ☺)
And yes, as alluded to at outset, you can easily achieve the desired outcome using a simple graph! Just for fun then, sort the data in reverse order, and include the split/delimited values as a bar graphs' "x-values" (which, by defn. for this type of graph, will now appear along the ordinary Cartesian 'y/vertical' axis)...
Zero points for this but thought it was an interesting discovery on my part!
(and if still in doubt, here's what the 'graph' would look like if I didn't kill everything except for the axis labels...):
Numerous references for relevant other items above, including research areas, as follows:
Excel Champs
StackOverflow - alternative applications for FilterXML
JUST ONE MORE THING...
In true Columbo style modus operandi, other ideas/approaches considered:
Application of pivot table?
Constructing matrices: I got a solution with a series of offset functions, but couldn't think of a feasible way to implement given space issues
Converting the split cells into a long digit through summation: e.g. 8 22 16 = 80 000 + 22 00 + 16. Using a substitute function with text (long digit, "General") I was able to successfully introduce the delimiter character ('|') for pairs of adjacent 'tuples' (e.g. I could get '8|2216', '822|16', but then a 'build up' formula where one cell depends upon converted values of the previous and so one was required once more, which landed me back to the proposal I have set out above
fyi - the matrix consideration only solves tuples of 2, for n-dimensional /combination one would need to 'pass' a string of characters over its mirror copy - e.g. {6,10,22} would pass over {6,10,22}, ignoring duplicate values would yield a trapezium as follows:
6
10
22
6
10
22
6
10
22
after the copy has 'passed' over the original (first row), we have the desired combination (22,10,6) (on the 'diagonal' such a matrix). This is akin to how Fourier Transforms work (kind of); but that aside, it was tempting to construct a matrix like this, but couldn't be bothered at this stage.
Will probably turn out to be a far simpler way that someone comes up with (I won't be the only person surprised based upon the various sources I've considered...)

Determine the number of search results upon using different sets of separators (Excel)

I would like to ask for your help with the formulation of a formula in Excel in order to compare the total number of search results upon using different sets of separator characters.
As I have multiple columns with content, as in the example below, I thought it would be possible to Count the search results in some way and do this for each column separately ( I would actually prefer to treat each column separately).
A
1 L-516-S-221-S-223
2 H-140.STR3
3 ST0 XP 23-9
4 etc.......
Preferably, I would like to use a varying a set of separator characters in order to determine the impact on the number of search results based on this set of separator characters. Logically, with an increasing number of separators more results will be returned (depending on separators included in the cell values of course).
The set of characters that I would like to experiment with is: “-_ .,;: “
Hopefully this makes sense and someone is able to help me out. Thank you.
Kind regards,
P
In your example - on its own will detect all three instances but for an overview you might construct a grid (say B1:H1 of your separators, including a space rather than an empty cell) and ColumnA each column in turn (maybe via links) then a formula in B2 such as:
=--ISNUMBER(FIND(B$1,$A2))
copied across to ColumnH and down to suit.
Alternative formula (for different question):
=IF(LEN($A2)-LEN(SUBSTITUTE($A2,B$1,""))>0,LEN($A2)-LEN(SUBSTITUTE($A2,B$1,""))+1,0)
Assumes, for example, no trailing spaces and separators are always separated. Results are not necessarily cumulative.

nested excel functions with conditional logic

Just getting started in Excel and I was working with a database extract where I need to count values only if items in another column are unique.
So- below is my starting point:
=SUMPRODUCT(COUNTIF(C3:C94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"}))
what i'd like to figure out is the syntax to do something like this-
=sumproduct (Countif range1 criteria..., where range2 criteria="is unique value")
Am I getting this right? The syntax is a bit confusing, and I'm not sure I've chosen the right functions for the task.
I just had to solve this same problem a week ago.
This method works even when you can't always sort on the grouping column (J in your case). If you can keep the data sorted, #MikeD 's solution will scale better.
Firstly, do you know the FREQUENCY trick for counting unique numbers? FREQUENCY is designed to create histograms. It takes two arrays, 'data' and 'bins'. It sorts 'bins', then creates an output array that's one longer than 'bins'. Then it takes each value in 'data' and determines which bin it belongs in, incrementing the output array accordingly. It returns the array. Here's the important part: If a value appears in 'bins' more than once, any 'data' value meant for that bin goes in the first occurrence. The trick is to use the same array for both 'data' and 'bins'. Think it through, and you'll see that there's one non-zero value in the output for each unique number in the input. Note that it only counts numbers.
In short, I use this:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to count unique numeric values in <array>
From this, we just need to construct arrays containing numbers where appropriate and text elsewhere.
In the example below, I'm counting unique days when the color is red and the fruit is citrus:
This is my conditional array, returning 1 or true for the rows I'm interested in:
($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0))
Note that this requires ctrl-shift-enter to be used as an array formula.
Since the value I'm grouping by for uniqueness is text (as is yours), I need to convert it to numeric. I use:
MATCH($C$2:$C$10,$C$2:$C$10,0)
Note that this also requires ctrl-shift-enter
So, this is the array of numeric values within which I'm looking for uniqueness:
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
Now I plug that into my uniqueness counter:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to get:
=SUM(SIGN(FREQUENCY(
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),""),
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
)))
Again, this must be entered as an array formula using ctrl-shift-enter. Replacing SUM with SUMPRODUCT will not cut it.
In your example, you'd use something like:
=SUM(SIGN(FREQUENCY(
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),""),
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),"")
)))
I'll note, though, that scaling might be a problem on data sets as large as yours. I tested it on larger data sets, and it was fairly fast on the order of 10k rows, but really slow on the order of 100k rows, such as yours. The internal arrays are plenty fast, but the FREQUENCY function slows down. I'm not sure, but I'd guess it's between O(n log n) and O(n^2) depending on how the sort is implemented.
Maybe this doesn't matter - none of this is volatile, so it'll just need to calculate once upon refreshing the data. If the column data is changing, though, this could be painful.
Asuming the source data is sorted by the key value [A], start with determining the occurence of the key column
B2: =IF(A2=A1;B1+1;1)
Next determine a group sum
C2: =SUMIF($A$2:$A$9;A2;$B$2:$B$9)
A key is unique if its group sum is exactly 1
D2: =(C2=1)
To count records which match a certain criterium AND are unique, include column D in a =IF(AND(D2, [yourcondition];1;0) and sum this column
Another option is to asume a key unique within a sorted list if it is unequal to both its predecessor and successor, so you could find the unique records like
E2: =AND(A2<>A1;A2<>A3)
G2: =IF(AND(E2;F2="this");1;0)
E and G can of course be combined into one single formula (not sure though if that helps ...)
G2(2): =IF(AND(AND(A2<>A1;A2<>A3);F2="this");1;0)
resolving unnecessarily nested AND's:
G2(3): =IF(AND(A2<>A1;A2<>A3;F2="this");1;0)
all formulas in row 2 should be copied down to the end of the list

Resources