Create Random number sequence without duplicates in excel for almost million rows or records - excel

I would want to create a random sequence of numbers in 11 digit format and that should run from 10000000000 to 999999999999 and each of the values should be unique and i would like to populate almost 20-50 million worth of records in excel without having to keep dragging all the way down at the bottom of the cell by clicking + button
I tried using RANDBETWEEN but seems like there are duplicates and i have to keep dragging which is a time consuming activity,is there any alternative better way to accomplish this ?
=RANDBETWEEN(10000000000,999999999999)

For that many unique numbers I suggest using an encryption, where the output is guaranteed unique for unique inputs.
Simply encrypt the numbers 0, 1, 2, ... for different unique inputs. You will need to use the same encryption key and other inputs (IV, nonce etc.) to guarantee unique outputs.
You will need to do some processing on the outputs to get them into the required range. Have a look at Format Preserving Encryption for some help with this.
As #BigBen pointed out, Excel is probably the wrong tool for this.

Related

How to sort rows in Excel without having repeated data together

I have a table of data with many data repeating.
I have to sort the rows by random, however, without having identical names next to each other, like shown here:
How can I do that in Excel?
Perfect case for a recursive LAMBDA.
In Name Manager, define RandomSort as
=LAMBDA(ζ,
LET(
ξ, SORTBY(ζ, RANDARRAY(ROWS(ζ))),
λ, TAKE(ξ, , 1),
κ, SUMPRODUCT(N(DROP(λ, -1) = DROP(λ, 1))),
IF(κ = 0, ξ, RandomSort(ζ))
)
)
then enter
=RandomSort(A2:B8)
within the worksheet somewhere. Replace A2:B8 - which should be your data excluding the headers - as required.
If no solution is possible then you will receive a #NUM! error. I didn't get round to adding a clause to determine whether a certain combination of names has a solution or not.
This is just an attempt because the question might need clarification or more sample data to understand the actual scenario. The main idea is to generate a random list from the input, then distribute it evenly by names. This ensures no repetition of consecutive names, but this is not the only possible way of sorting (this problem may have multiple valid combinations), but this is a valid one. The solution is volatile (every time Excel recalculates, a new output is generated) because RANDARRAY is volatile function.
In cell D2, you can use the following formula:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m),
idx, SORTBY(seq, RANDARRAY(m,,1,m, TRUE)), rRng, INDEX(rng, idx,{1,2}),
names, INDEX(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Here is the output:
Update
Looking at #JosWoolley approach. The generation of the random sorting can be simplified so that the resulting formula could be:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m), rRng,SORTBY(rng, RANDARRAY(m)),
names, TAKE(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Explanation
LET function is used for easy reading and composition. The name idx represents a random sequence of the input index positions. The name rRng, represents the input rng, but sorted by random. This sorting doesn't ensure consecutive names are distinct.
In order to ensure consecutive names are not repeated, we enumerate (nCnts) repeated names. We use a MAP for that. This is a similar idea provided by #cybernetic.nomad in the comment section, but adapted for an array version (we cannot use COUNTIF because it requires a range). Finally, we use SORTBY with input argument by_array, the map result (nCnts), to ensure names are evenly distributed so no consecutive names will be the same. Every time Excel recalculate you will get an output with the names distributed evenly in a different way.
Not sure if it's worth posting this, but I might as well share the results of my research such as it is. The problem is similar to that of re-arranging the characters in a string so that no same characters are adjacent The method is just to insert whichever one of the remaining characters (names) has the highest frequency at this point and is not the same as the previous character, then reduce its frequency once it has been used. It's fairly easy to implement this in Excel, even in Excel 2019. So if the initial frequencies are in D2:D8 for convenience using Countif:
=COUNTIF(A$2:A$8,A2)
You can use this formula in (say) F2 and pull it down:
=INDEX(A$2:A$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
and similarly in G2 to get the ages:
=INDEX(B$2:B$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
I'm fairly sure this will always produce a correct result if one is possible.
HOWEVER there is no randomness built in to this method. You can see if I extend it to more data that in the first several rows the most common name simply alternates with the other two names:
Having said that, this is a bit of a worst case scenario (a lot of duplication) and it may not look too bad with real data, so it may be worth considering this approach along with the other two methods.

Excel - Combine data from multiple tables dynamically

I would like to combine three different tables in Excel. I am struggling with the fact that the tables can vary in length.
For example:
What I would like to achieve is all the tables' data in one table without empty spaces. So first the two entries from the first table then the three entries from the second table and lastly the entry from the third table. But the amount of rows in each table can vary.
How can I do this dynamically so when the amount of entries in the tables change it can handle this? I'm using Mac with Office365. Thanks!
EDIT:
Output with Ron Rosenfeld's solution, the range of the list goes down from cell 5 - cell 103. Could this be reduced to 5 - 15?:
If you have Excel 2019 or Office 365, with the FILTERXML and TEXTJOIN functions, you can use:
=FILTERXML("<t><s>" & TEXTJOIN("</s><s>",TRUE,Table1,Table2, Table3) & "</s></t>","//s[.!=0]")
If those zero's are really blanks, you can omit [.!=0] from the xPath argument, but it won't hurt to leave it there
Edit:
With MAC versions of Office 365 that do not have the FILTERXML function, I believe the following will work:
=LET(
a,299,
x,IF(SEQUENCE(99,,0)=0,1,SEQUENCE(99,,0)*a),
y,TEXTJOIN(REPT(" ",a),TRUE,Table19,Table20,Table21),
z, TRIM(MID(y,x,a)),FILTER(z,(z<>"0")*(z<>""))
)
Note the a parameter in the above function
Because of how the splitting algorithm works, the sequence for each cell will not always start at the beginning of a string.
Hence, if there are enough letters in the various strings, the start number may eventually get offset enough to cause a split in the wrong location
One fix is to use an arbitrarily large number of space's to insert.
99 is frequently large enough, but not for this data set.
299 seems to be large enough for the data set as shown in your actual data.
I believe the minimum number should be the sum of the lengths of all the characters in the original tables (including the 0's) plus one (1). But not sure of this.
You can certainly adjust it as needed
If the number becomes too large, you could run into the 32,767 character limitation. If that happened, an error message would occur.
So, if you wanted to compute a, dynamically, you could try something like:
=LET(
a,SUM(LEN(Table19[Column1]),LEN(Table20[Column1]),LEN(Table21[Column1]))+1,
x,IF(SEQUENCE(99,,0)=0,1,SEQUENCE(99,,0)*a),
y,TEXTJOIN(REPT(" ",a),TRUE,Table19,Table20,Table21),
z, TRIM(MID(y,x,a)),FILTER(z,(z<>"0")*(z<>""))
)
but no guarantees.
Assuming the data is in A:C, and empty cell is blank (not 0).
In E1 put :
=IF(ROW()>COUNTA(A:C),"",
INDEX(A:C,
IF(ROW()<=COUNTA(A:A),ROW(),IF(ROW()<=COUNTA(A:B),ROW()-COUNTA(A:A),ROW()-COUNTA(A:B))),
IF(ROW()<=COUNTA(A:A),1,IF(ROW()<=COUNTA(A:B),2,3)))
)
Idea : use row() to guide in selection in index. counta() is used guide converting 'row()' to usable index numbers. Also make the output cell blank "" for row() > counta(a:c).
Please share if it works/not.

Excel - Return name from list based on multiple criteria

This is my 1st post here (and not allowed to paste images). I have been trying to solve this issue for a couple of days with no luck. I'm working on an Excel spreadsheet for a game and cannot return a name based on multiple criteria. See below:
Table
I am trying to return, for example, the name of the Guardian with the highest amount of games played.
I've tried Index/match/sumproduct combinations but I can't figure this one out. Can you help me?
=index(Data!$A:$H,match((1,Data!B:B=Overview!B12)*(Data!C:C=Overview!B23)),0),1)
=MAX(IF(Data!B:B=Overview!B12,Data!C:C))
I'm thinking if I could join these two formulas together I might be able to make it work.
Try this array formula:
=INDEX(Data!$A1:$A99,MATCH(MAX(Data!$C$1:$C$99*(Data!$B$1:$B$99=B12)),
Data!$C$1:$C$99*(Data!$B$1:$B$99=B12),0))
CtrlShiftEnter
Notice that we should avoid using "full columns" in array formulas because they would introduce the computation of huge arrays and hence would slow down the formulas. I limit it here to 99 rows, use a limit that is big enough to span your data.

Replacing null values with zeroes in multiple columns [Spotfire]

I have about 100 columns with some empty values that I would like to replace with zeroes. I know how to do this with a single column using Calculate and Replace, but I wanted to see if there was a way to do this with multiple columns at once.
Thanks!
You could script it but it'd probably take you as long to write the script as it would to do it manually with a transformation. A better idea would be to fix it in the data source itself before you import it so SPOTFIRE doesn't have to do the transformation every time, which if you are dealing with a large amount of data, could hinder your performance.

Display, sort and filter numbers with multiple decimal in excel 2007

I'm using excel 2007.
I've a list of tasks (200-500) that I need to group in different category/section etc (multiple filters). Whole data is in excel table so I can apply Excel's build-in table filters to display exact data that I need.
However it is always difficult to apply multiple filter to display expected data, specially as I need to do it very frequently. To make things simple I'm planning to number each record like
a.b.c.d.e.f
Where a, b, c, d, e, f are simple numbers. List looks like:
1
1.1
1.2
1.2.1
1.2.1.1
1.2.2
1.3
& so on.
Problem is, Excel take it as number with single decimal but as soon as I add second decimal, excel treat it as text, which is obvious in general behavior.
However, as special case, I need excel treat both as number or text. Number is preferable as I want to sort them, which might be difficult as a text.
To make the things little more complex, while filtering in table, I require if I can add some formula to filter results like 1.* should display all numbers starts with 1.
Is it possible with excel's default behavior, without VBA?
If no, is it possible with VBA? If yes, any clue is appreciated. I don't need whole program as I can write basic VBA program, just a clue how it can be done?
I sort mine by adding a helper column that adds a letter to the front and sort on that. E.g. 1 becomes f1, 1.1 becomes f1.1 etc. Then all are sorted as text.
You can use the formula ="f" & A1.
My sample:
Then the data sorted:
And the filter:
If I were to try this without VBA, my first step would be to use the sort to columns function on the data tab.
Next make sure all empty spaces in your data are filled with zeros.
Then sort the data by column
as long as you left your original data in the same row as the sorted data (I didn't in the images posted to focus on the process), your items should now be in order.

Resources