creating a function using only loops, conditionals, and variables - python-3.x

Two numbers are twins if they both consist of the same digits. For instance: 134425 and 12345 are twins but 189 and 18 are not. Create a function areTwins(num1,num2) which takes in two numbers and returns True if they are twins, and False otherwise. You are only allowed to use conditionals, functions, and loops.
I was thinking to use two helper functions one to get the number length and the other to check the appearance of numbers (0 to 9) in the two numbers and compare it but I am lost in doing the second helper function. Can someone please help?

I would do two array and put the number in the array. I would then sort both arrays, and remove all duplicates. Then you just need to compare them and it’s done!

Related

How to identify unique combinations of pairs in Excel?

My goal is to get a two-column array of all the possible unique pairs of N items (so N*(N-1)/2 pairs).
I have done this in the past using extra calculation columns, but with the advent of LET() I was wondering if I can get a single function call.
This is my example data:
In the end I came up with this, which is a little unwieldy but does the job:
=LET(x,B2:B5,INDEX(x,LET(n,ROWS(x),s,(1+SEQUENCE((n*n*2),2))/2,r,INT((s-1)/n)+1,LET(a,IF(INT(s)=s,r,INT(s)-((r-1)*n)),FILTER(a,INDEX(a,,1)<INDEX(a,,2))))))

Iterative/Looped Substitute without VBA

Abridged Question:
If I have a concatenated string of "|#|#|#|...|#|", how can I apply a multiplier to each of the numbers and update the concatenated text? For example, for |4|12|8|, multiply by a factor of 2 and update the concatenated text to |8|24|16|.
Background
I have three columns of interest. The first column contains a date, the second an amount or factor, and the third column concatenates data into the format "|#|#|...|#|" (e.g., |2|5|, |2|5|12|, |4|12|, etc.). At times, a multiplying factor needs to be applied to the concatenated data, and the individual numbers would need to be updated accordingly.
An example would be—
Date Amt Concatenated Data
01/01/18 2 |2|
01/05/18 5 |2|5|
02/06/18 12 |2|5|12|
03/25/18 -3 |4|12|
03/31/18 8 |4|12|8|
04/01/18 F2 |8|24|16| (factor of 2 applied)
04/15/18 12 |8|24|16|12|
04/01/18 F1/4 |2|6|4|3| (factor of 1/4 applied)
With a formula, how can I apply the factor to the concatenated data, and update the individual numbers?
I'm bound by the following conditions:
Excel 2007, so no TEXTJOIN function
No VBA or UDFs (due to security policies)
Individual numbers are dynamic (i.e., I can't use a static value for the "old_text" parameter of the SUBSTITUTE formula)
Amount of individual numbers within concatenated data is also dynamic (may contain one number, or may contain dozens of different numbers)
I can pull out the individual numbers using an array formula. I can even then multiply those numbers by the factor to produce an array result. However, I can't rebuild the concatenated data, because CONCATENATE doesn't work on an array. I've also tried SUBSTITUTE, but I can't iterate through the "|" separators. I can only substitute a given segment (e.g., change all entries of "|2|" to "|4|"). Nesting SUBSTITUTE or using individual columns won't work, since it could potentially involve dozens of instances.
Just to add some info on the concatenated data:
Amt>0, then value is concatenated to the end of the previous concatenated value
Amt<0, begin reducing individual numbers in concatenated value (CV) until reduction amount reached (e.g., for |2|5|12| and Amt=-3, reduce CV to |4|12|, which is -2 from the first segment and -1 from the second segment)
Amt reduction is limited to the sum of the previous CV's individual numbers (e.g., for |4|12|, the reduction cannot exceed 16)
Amt=F#, indicates a multiplying factor, and the CV's numbers need to be updated
The CV has no max (could have dozens to hundreds of individual numbers, with numbers going from 1 to 100,000+), other than any max applied by Excel itself on string length
HIGH LEVEL
Four parts to this solution
They satisfy pre-requisites (2007 compatibility, no VB, no Office 365 requirement, no custom VB functions, provide for complete 'dynamic' nature of variable length of cells to concatenate)
Caveat: to best knowledge / research, there is no parsimonious single-cell function & therefore an interim step has been proposed)
One more caveat: I imagine the simple 'hack' of wrapping a graph around the delimited data is out of question (see 'Other/Various' below ☺)
PARTS 1-4
Accompanying parts 1-4 below are functions which relate to the following screenshot:
I have also uploaded / amended to meet requirements of Google Sheets (see here)
Parts 1 & 2:
Similar in that they rely upon FilterXML technique to count component / terms, and split cells respectively:
Part 1:
=COUNT(2*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num")))
Note: google sheets doesn't recognise FilterXML, so have amended technique/functions accordingly. For instance, above can be determined using counta on the split cells in Part 2 (easier / much more simpler than proposed approach above, albeit less robust given any cells lying to the right of the split cells will interfere with ordinary functionality of this approach).
Part 2:
It's either a manual approach, a fancy series of 'mid' &/or substitute / left/right functions, or the following FilterXML code which, per various sources (e.g. here) should be compatible with Excel 2007:
=IF(LEFT(C12,1)="F",1*SUBSTITUTE(C12,"F",""),1)*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num"))
Commonality with Part 1 (re: FilterXML) can be seen - the only difference is that the count(Part 1) has been replaced with the transformation (multiplicative factor, as given in O.P's Q).
Part 3
Nothing fancy here - a simple concatenation (which is a far cry from a 'recursive' substitution function, I know, but hey - it does the trick and can always be placed in a mirror copy of the original sheet to avoid space issues/cell interaction issues)
=IF(H12="","",IF(G24="","|","")&G24&H12&"|")
Part 4
Thanks to the number of terms derived in Part 1, an offset function can easily determine the final cell pertaining to the concatenated 'build up' of 'transformed' values (per Part 3):
=OFFSET(H31,0,E31-1,1,1)
OTHER / VARIOUS
Various other proposals and 'workarounds' exist; unfortunately, these appear to fall short in one way or another of the pre-requisites set forth, videlicit:
a) Function/formula based
b) No VB
c) Excel 2007
d) Dynamic (variable/unknown number of terms)
Manual: e.g. function = concatenate(transpose(desired range)), and then components of the concatenate function and pressing F9 to convert to calculated values, which are readily applicable in the concatenate function. Disadvantage: time consuming in relation to 'automated solution' (needs to be done for each applicable toy). Advantage: no additional 'spreadsheet real estate' required, quicker/straightforward implementation in first instance.
Variants of the 'build-up' method: e.g. per Part 3, however, this alone does not ensure for an automated approach across an unknown number of terms in the original concatenated list.
Have mentioned in a previous solution (here), but may be case that you are eligible for Office 365 functionality whilst on a previous version of Excel (see Office Insider here)
Other proposals (above/below in this forum mind you) propose textjoin (so not sure if this is a comprehension issue or what ☺)
And yes, as alluded to at outset, you can easily achieve the desired outcome using a simple graph! Just for fun then, sort the data in reverse order, and include the split/delimited values as a bar graphs' "x-values" (which, by defn. for this type of graph, will now appear along the ordinary Cartesian 'y/vertical' axis)...
Zero points for this but thought it was an interesting discovery on my part!
(and if still in doubt, here's what the 'graph' would look like if I didn't kill everything except for the axis labels...):
Numerous references for relevant other items above, including research areas, as follows:
Excel Champs
StackOverflow - alternative applications for FilterXML
JUST ONE MORE THING...
In true Columbo style modus operandi, other ideas/approaches considered:
Application of pivot table?
Constructing matrices: I got a solution with a series of offset functions, but couldn't think of a feasible way to implement given space issues
Converting the split cells into a long digit through summation: e.g. 8 22 16 = 80 000 + 22 00 + 16. Using a substitute function with text (long digit, "General") I was able to successfully introduce the delimiter character ('|') for pairs of adjacent 'tuples' (e.g. I could get '8|2216', '822|16', but then a 'build up' formula where one cell depends upon converted values of the previous and so one was required once more, which landed me back to the proposal I have set out above
fyi - the matrix consideration only solves tuples of 2, for n-dimensional /combination one would need to 'pass' a string of characters over its mirror copy - e.g. {6,10,22} would pass over {6,10,22}, ignoring duplicate values would yield a trapezium as follows:
6
10
22
6
10
22
6
10
22
after the copy has 'passed' over the original (first row), we have the desired combination (22,10,6) (on the 'diagonal' such a matrix). This is akin to how Fourier Transforms work (kind of); but that aside, it was tempting to construct a matrix like this, but couldn't be bothered at this stage.
Will probably turn out to be a far simpler way that someone comes up with (I won't be the only person surprised based upon the various sources I've considered...)

Frequency() with arrays: adds an element to return arrays

I'm using the following formula as named formula (via name manager). It is then used in a larger sumproduct(). The goal is to ensure that with an array calculation, the calculation is only made once for certain groups of rows (e.g. you have the same data repeated accross many rows for category A. I only need to know how many people are in category A once).
=IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],
tdata[reportUUID],0),0),IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],
tdata[reportUUID],0),0))>0,TRUE)
Let's step through the results one by one with the evaluate formula in Excel. Sorry for the screenshot, but Excel doesn't allow to copy actual steps with real data....
In order of steps:
In the last image, there's now a 7th item in my array. I only have 6 row of data, hence why for the previous steps I only had 6 items in the array, as was expect.
This is messing up my calculations, because the return array from this function gets multiplied by others arrays which all have 6 items (or whatever is the number of data rows I have).
What is this 7th item, and how can I either get ride of it or prevent it from return errors?
I did try to wrap some formula into iferror() or ifna(), however it doesn't feel clean. I feel this might backfire and isn't a strong way to handle this. I rather take it at the source....
EDIT: For example of use with other arrays:
{=SUMPRODUCT(--IFERROR(((tdata[_isVisible]=1)*(f_uniqueUUIDfactor),0))}
Where f_uniqueUUIDfactor is the formula from the initial post. tdata[_isVisible]=1 is used as a way to filter data on the dashboard (e.g. through dropdown, the users can set ranges for dates, and with VBA I hide the rows in the raw data NOT within the range).
The point is that sumproduct() ends up multipliying each raw data row thogheter as 0 & 1 s, so that only those meeting all the criterias get returned. The IFERROR() above is the workaround for the extra array element introduced by frequency(). It works as is, but if a cleaner way exists I'd prefer that. I would also be keen on understanding why that elements get added.
This is a good example of why it is preferable to use multiple, recursive IF statements when evaluating arrays over multiple criteria, rather than form the product of those arrays.
Firstly, though, before coming to the reason for that statement, I should point out a few minor technical inaccuracies/flaws with your construction also.
1) By including a value_if_false clause in your constructions being passed as FREQUENCY's data_array and bins_array parameters, you are risking incorrect results, since zero is a valid numerical to be considered by FREQUENCY, whereas a Boolean FALSE (which would be the equivalent entry in the resulting array had you omitted the value_if_false clause altogether) is disregarded by this function.
2) MATCH with an exact (i.e. 0, or FALSE) match_type parameter is a relatively resource-heavy construction, particularly if the range to be considered is quite large. As such, and since it is not necessary to use this construction for FREQUENCY's bins_array parameter, it is preferable to use the more efficient:
ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1
Moreover, note that repetition of the IF(LEN construction is also not necessary within this second parameter.
In all, then:
IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],tdata[reportUUID],0)),ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1)>0,TRUE)
is considerably more rigorous and more efficient than the version you give.
To answer your main question, it is well-documented that FREQUENCY always returns an array having a number of entries one greater than that of the bins_array passed.
As mentioned in my comment to your post, the resolution to the problem you are facing largely depends on precisely what further manipulation you are intending for the resulting array.
However, let's assume for the sake of an explanation that you simply wish to multiply the array resulting from your FREQUENCY construction by some other column within your table, tdata[Column2] say, and then sum the result.
The difference between:
=SUM(IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],tdata[reportUUID],0)),ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1)>0,TRUE)*tdata[Column2])
i.e. using multiplication of the two arrays, and:
=SUM(IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],tdata[reportUUID],0)),ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1)>0,tdata[Column2]))
i.e. using a straightforward IF clause, is here crucial.
In fact, the former will always return an error, whereas the latter, in general, will not.
The reason is that the former will resolve to (assuming that your table has e.g. 10 rows' worth of data and assuming some random Boolean results to the FREQUENCY construction):
=SUM(IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE},TRUE)*tdata[Column2])
which is, since the value_if_true clause is superfluous here:
=SUM({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}*tdata[Column2])
whereas the second construction I give will resolve to:
=SUM(IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE},tdata[Column2]))
The two may look identical, but the fact that the former is using multiplication to resolve the array, whereas the latter is not, is the key difference.
Although in both cases the array resulting from the FREQUENCY construction, i.e.:
{TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}
comprises 11 entries (i.e. 1 more than the number of entries in the second array being considered), the difference is that, when you then attempt to multiply an 11-element array with a 10-element array (i.e. tdata[Column2]), Excel, rather than outright disallowing such an operation, artificially redimensions the smaller of the two arrays such that it matches the dimensions of the larger.
In doing so, however, any additional entries are automatically set as #N/A error values.
Effectively, then:
=SUM({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}*tdata[Column2])
is resolved as:
=SUM({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}*{38;67;49;3;10;11;97;20;3;57;#N/A})
i.e., as mentioned, the second, 10-element array is redimensioned to one of 11 elements in an attempt to form a legitimate operation. And, as also mentioned, that 11th element is #N/A, which means of course that the entire construction will also result in that value.
In the non-multiplication version, however, i.e.:
=SUM(IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE},tdata[Column2]))
although the same redimensiong also takes place, we are saved by our use of an IF clause in place of multiplication, since the above resolves to:
=SUM(IF({TRUE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;FALSE},{38;67;49;3;10;11;97;20;3;57;#N/A}))
and the Boolean FALSE in the 11th position here 'overrides' the error value in the equivalent position from the second array, since the above resolves to:
=SUM({38;FALSE;49;FALSE;10;11;97;FALSE;3;FALSE;FALSE})
Regards

How to compare 2 arrays with unequal numbers of elements in Excel

In an Excel array formula, I would like to test each element of one array against each element of a second array, when the 2 arrays do NOT have the same number of elements. Simplified right down, this scenario could be represented by:
=SUMPRODUCT({1,2,3,4,5}={1,2})
NB - in my real world scenario these arrays are calculated from various prior steps.
Using the above example, I would want a result of {TRUE,TRUE,FALSE,FALSE,FALSE}. What I get is {TRUE,TRUE,#N/A,#N/A,#N/A}.
It's clear that, when there's more than 1 value being tested for, Excel wants equal numbers of elements in the 2 arrays; when there isn't, the #N/A error fills in the blanks.
I've considered writing a UDF to achieve what I want, and I'm pretty sure my coding skills are up to creating something like:
=ArrayCompare({1,2,3,4,5},"=",{1,2})
But I'd much rather do this using native functionality if it's not too cumbersome...
So, simple question; can an array formula be constructed to do what I'm after?
Thanks peeps!
Using MATCH function is probably the best way.....but if you actually want to compare every element in one array with another array in a direct comparison then one should be a "column" and one a "row", e.g.
=SUMPRODUCT(({1,2,3,4,5}={1;4})+0)
Note the semi-colon separator in the second array
If you can't actually change the column/row designation then TRANSPOSE can be used, i.e.
=SUMPRODUCT(({1,2,3,4,5}=TRANSPOSE({1,4}))+0)
You may not get the required results if the arrays contain duplicates because then you will get some double-counting, e.g. with this formula
=SUMPRODUCT(({1,1,1,1,1}={1;1})+0)
the result is 10 because there are 5x2 comparisons and they are all TRUE
Maybe:
{=IF(ISERROR(MATCH({1,2,3,4,5},{1,2},0)),FALSE,TRUE)}
If the second array is a subset of the first array, same order, and starting at position 1 then you can use this array formula for equivalence testing:
=IFERROR(IF({1,2,3,4,5}={1,2},TRUE),FALSE)
For non equivalence just swap the FALSE and TRUE
=IFERROR(IF({1,2,3,4,5}={1,2},FALSE),TRUE)
You can then use this in other formulas just as an array:
However if the arrays are not in order, as in this example:
{1,2,3,4,5},{1,4,5}
Then you have to use MATCH. However all you need is to surround the match with an ISNUMBER like so:
Equivalence test:
=ISNUMBER(MATCH({1,2,3,4,5},{1,4,5},0))
Non Equivalence test:
=NOT(ISNUMBER(MATCH({1,2,3,4,5},{1,4,5},0)))
Remember all array formulas are entered with ctrl + shift + enter

How to compare two dicts based on roundoff values in python?

I need to check if two dicts are equal. If the values rounded off to 6 decimal places are equal, then the program must say that they are equal. For e.g. the following two dicts are equal
{'A': 0.00025037208557341116}
and
{'A': 0.000250372085573415}
Can anyone suggest me how to do this? My dictionaries are big (more than 8000 entries) and I need to access this values multiple times to do other calculations.
Test each key as you produce the second dict iteratively. Looking up a key/value pair from the dict you are comparing with is cheap (linear cost), and round the values as you find them.
You are essentially performing a set difference to test for equality of the keys, which requires at least a full loop over the smallest of the sets. If you already need to loop to generate one of the dicts, you are at an advantage as that'll give you the shortest route to determining inequality soonest.
To test for two floats being the same within a set tolerance, see What is the best way to compare floats for almost-equality in Python?.

Resources