Identify cells with Repeated strings

Identify cells with Repeated strings - string

Sometimes a person would Type 9999999999999999 or 0000000000 8999999998888888, ...
instead of typing their identification number. I want a column to identify those cases, meaning the cases in which a single string (In this case a number string) is typed more than 7 times.
I have no idea on how to do it.. My best guess would be to count each number using len, but that would be at least 10 IF statements opened... Any suggestions?

We can find the maximum digit count as follows (as a calculated column):
MaxDigitCount =
VAR n = LEN ( [ID] )
VAR Digits =
ADDCOLUMNS ( GENERATESERIES ( 1, n ), "#D", MID ( [ID], [Value], 1 ) )
VAR Frequencies =
GROUPBY ( Digits, [#D], "#Freq", COUNTX ( CURRENTGROUP (), [Value] ) )
RETURN
MAXX ( Frequencies, [#Freq] )
Suppose [ID] = "899999999777". Then n = 12 and Digits is the table generated by creating a list from 1 to 12 and adding the column of digits corresponding to each of those positions.
Digits =
Value
#D
1
8
2
9
3
9
4
9
5
9
6
9
7
9
8
9
9
9
10
7
11
7
12
7
Then Frequencies summarizes this table by grouping on #D and counting the number of occurrences of each distinct digit.
Frequencies =
#D
#Freq
8
1
9
8
7
3
Finally, return the maximum value in the #Freq column.
Using this, it's easy to check if the value is greater than 7. You can modify the final line to
IF ( MAXX ( Frequencies, [#Freq] ) > 7, ">7", "<=7" )

Related

Is there a way to sort a list so that rows with the same value in one column are evenly distributed?

Hoping to sort (below left) by sector but distribute evenly (below right):
Name
Sector.
Name.
Sector
A
1
A
1
B
1
E
2
C
1
H
3
D
4
D
4
E
2
B
1
F
2
F
2
G
2
J
3
H
3
I
4
I
4
C
1
J
3
G
2
Real data is 70+ rows with 4 sectors.
I've worked around it manually but would love to figure out how to do it with a formula in excel.
Here's a more complete (and hopefully more accurate) idea - the carouselOrder is the column I'd like to generate via a formula.
guestID
guestSector
carouselOrder
1
1
1
2
1
5
3
1
9
4
1
13
5
2
2
6
2
6
7
2
10
8
2
14
9
3
3
10
3
7
11
3
11
12
2
18
13
1
17
14
1
20
15
1
23
16
2
21
17
2
24
18
2
27
19
1
26
20
1
29
21
1
30
22
1
31
23
3
15
24
3
19
25
3
22
26
3
25
27
3
28
28
1
32
29
4
4
30
4
8
31
4
12
32
4
16

When using Office 365 you can use the following in D2: =MOD(SEQUENCE(COUNTA(A2:A11),,0),4)+1
This create the repetitive counter of the sectors 1 to 4 to the total count of rows in your data.
In C2 use the following:
=BYROW(D2#,LAMBDA(x,
INDEX(
FILTER($A$2:$A$11,$B$2:$B$11=x),
SUM(--(D$2:x=x)))))
This filters the Names that equal the sector of mentioned row and indexes it to show only the result where the row in the filter result equals the count of the same sector (D2#) up to current row.

Let's try the following approach that doesn't require to create a helper column. I would like to explain first the logic to build the recurrence, then the excel formula that builds such recurrence.
If we sort the input data Name and Sector. by Sector. in ascending order, the new positions of the Name values (letters) can be calculated as follow (Table 1):
Name
Sector.Sorted
Position
A
1
1+4*0=1
B
1
1+4*1=5
C
1
1+4*2=9
E
2
2+4*0=2
F
2
2+4*1=6
G
2
2*4*2=10
H
3
3+4*0=3
J
3
3+4*1=7
D
4
4+4*0=4
I
4
4+4*1=8
The new positions of Name (letters) follows this pattern (Formula 1):
position = Sector.Sorted + groupSize * factor
where groupSize is 4 in our case and factor counts how many times the same Sector.Sorted value is repeated, starting from 0. Think about Sector.Sorted as groups, where each set of repeated values represents a group: 1,2,3 and 4.
If we are able to build the Position values we can sort Name, based on the new positions via SORTBY(array, by_array1) function. Check SORTBY documentation for more information how this function works.
Here is the formula to get the Name sorted in cell E2:
=LET(groupSize, 4, sorted, SORT(A2:B11,2), sName,
INDEX(sorted,,1),sSector, INDEX(sorted,,2),
seq0, SEQUENCE(ROWS(sSector),,0), mapResult,
MAP(sSector, seq0, LAMBDA(a,b, IF(b=0, "SAME",
IF(a=INDEX(sSector,b), "SAME", "NEW")))), factor,
SCAN(-1,mapResult, LAMBDA(aa,c,IF(c="SAME", aa+1,0))),
pos,MAP(sSector, factor, LAMBDA(m,n, m + groupSize*n)),
SORTBY(sName,pos)
)
Here is the output:
Explanation
The name sorted represents the input data sorted by Sector. in ascending order, i.e.: SORT(A2:B11,2). The names sName and sSector represent each column of sorted.
To identify each group we need the following sequence (seq0) starting from 0, i.e. SEQUENCE(ROWS(sSector),,0).
Now we need to identify when a new group starts. We use MAP function for that and the result is represented by the name mapResult:
MAP(sSector, seq0, LAMBDA(a,b, IF(b=0, "SAME",
IF(a=INDEX(sSector,b), "SAME", "NEW"))))
The logic is the following: If we are at the beginning of the sequence (first value of seq0), then returns SAME otherwise we check current value of sSector (a) against the previous one represented by INDEX(sSector,b) if they are the same, then we are in the same group, otherwise a new group started.
The intermediate result of mapResult is:
Name
Sector Sorted
mapResult
A
1
SAME
B
1
SAME
C
1
SAME
E
2
NEW
F
2
SAME
G
2
SAME
H
3
NEW
J
3
SAME
D
4
NEW
I
4
SAME
The first two columns are shown just for illustrative purpose, but mapResult only returns the last column.
Now we just need to create the counter based on every time we find NEW. In order to do that we use SCAN function and the result is stored under the name factor. This value represents the factor we use to multiply by 4 within each group (see Table 1):
SCAN(-1,mapResult, LAMBDA(aa,c,IF(c="SAME", aa+1,0)))
The accumulator starts in -1, because the counter starts with 0. Every time we find SAME, it increments by 1 the previous value. When it finds NEW (not equal to SAME), the accumulator is reset to 0.
Here is the intermediate result of factor:
Name
Sector Sorted
mapResult
factor
A
1
SAME
0
B
1
SAME
1
C
1
SAME
2
E
2
NEW
0
F
2
SAME
1
G
2
SAME
2
H
3
NEW
0
J
3
SAME
1
D
4
NEW
0
I
4
SAME
1
The first three columns are shown for illustrative purpose.
Now we have all the elements to build our pattern for the new positions represented with the name pos:
MAP(sSector, factor, LAMBDA(m,n, m + groupSize*n))
where m represents each element of Sector.Sorted and factor the previous calculated values. As you can see the formula in Excel represents the generic formula (Formula 1 see above). The intermediate result will be:
Name
Sector Sorted
mapResult
factor
pos
A
1
SAME
0
1
B
1
SAME
1
5
C
1
SAME
2
9
E
2
NEW
0
2
F
2
SAME
1
6
G
2
SAME
2
10
H
3
NEW
0
3
J
3
SAME
1
7
D
4
NEW
0
4
I
4
SAME
1
8
The previous columns are shown just for illustrative purpose. Now we have the new positions, so we are ready to sort based on the new positions for Name via:
SORTBY(sName,pos)
Update
The first MAP can be removed creating an array as input for SCAN that has the information of sSector and the index position to be used for finding the previous element. SCAN only allows a single array as input argument, so we can combine both information in a new array. This is the formula can be used instead:
=LET(groupSize, 4, sorted, SORT(A2:B11,2), sName,
INDEX(sorted,,1),sSector, INDEX(sorted,,2),
factor, SCAN(-1,sSector&"-"&SEQUENCE(ROWS(sSector),,0),
LAMBDA(aa,b, LET(s, TEXTSPLIT(b,"-"),item, INDEX(s,,1),
idx, INDEX(s,,2), IF(aa=-1, 0, IF(1*item=INDEX(sSector, idx), aa+1,0))))),
pos,MAP(sSector, factor, LAMBDA(m,n, m + groupSize*n)),
SORTBY(sName,pos)
)
We use inside of SCAN a LET function to calculate all required elements for doing the comparison as part of the calculation of the corresponding LAMBDA function. We extract the item and the idx position used to find previous element of sSector via:
1*item=INDEX(sSector, idx)
we are able to compare each element of sSector with previous one, starting from the second element of sSector. We multiply item by 1, because TEXTSPLIT converts the result to text, otherwise the comparison will fail.

Pandas remove group if difference between first and last row in group exceeds value

I have a dataframe df:
df = pd.DataFrame({})
df['X'] = [3,8,11,6,7,8]
df['name'] = [1,1,1,2,2,2]
X name
0 3 1
1 8 1
2 11 1
3 6 2
4 7 2
5 8 2
For each group within 'name' and want to remove that group if the difference between the first and last row of that group is smaller than a specified value d_dif in absolute way:
For example, when d_dif= 5, I want to get:
X name
0 3 1
1 8 1
2 11 1

If your data is increasingly in X, you can use groupby().transform() and np.ptp
threshold = 5
ranges = df.groupby('name')['X'].transform(np.ptp)
df[ranges > threshold]
If you only care about first and last, then transform just first and last:
threshold = 5
groups = df.groupby('name')['X']
ranges = groups.transform('last') - groups.transform('first')
df[ranges.abs() > threshold]

Amazon Athena (Presto) SELECT statement to create (n^2 + n)/2 (𝑛th triangular number)

I'm using Athena and trying to find a way to create a select statement that will return a sequence in the below format:
Numer
1
2
2
3
3
3
4
4
4
4
And so on, up to 200.
Is it even possible?

Combine sequence() with UNNEST:
SELECT n FROM UNNEST(sequence(1, 5)) t(n)
CROSS JOIN UNNEST(sequence(1, n)) x(y);
presto:default> SELECT n
-> FROM UNNEST(sequence(1, 5)) t(n)
-> CROSS JOIN UNNEST(sequence(1, n)) x(y);
n
---
1
2
2
3
3
3
4
4
4
4
5
5
5
5
5
(15 rows)
(tested in Presto 326 but will work in Athena too)

Run:
SELECT numbers FROM (
SELECT * FROM (
VALUES flatten(
transform(
sequence(1, 4),
x -> repeat(x, cast(x AS INT))
)
)
) AS x (a) CROSS JOIN UNNEST(a) AS t (numbers)
);
it will return:
numbers
---------
1
2
2
3
3
3
4
4
4
4
(10 rows)

Align two strings perfectly (python)

Is it possible to align the spaces and characters of two strings perfectly?
I have two functions, resulting in two strings.
One just adds a " " between a list of digits:
digits = 34567
new_digits = 3 4 5 6 7
The second function takes the string and prints out the index of the string, such that:
digits = 34567
index_of_digits = 1 2 3 4 5
Now the issue that I am having is when the length of the string is greater than 10, the alignment is off:
I am supposed to get something like this:
Please advice.

If your digits are in a list, you can use format to space them uniformly:
L = [3,4,2,5,6,3,6,2,5,1,4,1]
print(''.join([format(n,'3') for n in range(1,len(L)+1)]))
print(''.join([format(n,'3') for n in L]))
Or with f-string formatting (Python 3.6+):
L = [3,4,2,5,6,3,6,2,5,1,4,1]
print(''.join([f'{n+1:3}' for n in range(len(L))]))
print(''.join([f'{n:3}' for n in L]))
Output:
1 2 3 4 5 6 7 8 9 10 11 12
3 4 2 5 6 3 6 2 5 1 4 1
Ref: join, format, range, list comprehensions

spoj - CPCRC1C, sum of digits of numbers 1 to n, need clarification, not solution

Once, one boy's teacher asked him to calculate the sum of numbers 1 through n.
the boy quickly answered, and his teacher made him another challenge. He asked him to calculate the sum of the digits of numbers 1 through n.
Input
Two space-separated integers 0 <= a <= b <= 109.
Output
The sum of the digits of numbers a through b.
Example
Input:
1 10
Output: 46
can someone explain what is meant by sum of the digits of numbers a to b?
from above, sum of {1 2 3 4 5 6 7 8 9 10 } is 55 , it is a well known Gaussian formula
but the output is 46!
if i count from 2 to 9, excluding the border numbers 1 and 10, the answer is 44 , still not 46
So what is meant by sum of digits of numbers?

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + (1 + 0)
Don't treat the 10 as the number 10, rather the digits 1 and 0

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Identify cells with Repeated strings - string

Related

Is there a way to sort a list so that rows with the same value in one column are evenly distributed?

Pandas remove group if difference between first and last row in group exceeds value

Amazon Athena (Presto) SELECT statement to create (n^2 + n)/2 (𝑛th triangular number)

Align two strings perfectly (python)

spoj - CPCRC1C, sum of digits of numbers 1 to n, need clarification, not solution

Categories

Resources