Google DialogFlow --> Number-Sequence entity only matches SPECIFIC length of characters - dialogflow-es

I have an entity with type: #sys.number-sequence, which matches a code 4 numbers long, for example: (Spoken) "1 2 3 4" or "1234".
In the Intent, if the user speaks a code that's 1, 2, 3, 5 or 6 characters in length and not just 4-long, this pattern does NOT match. So in the "User Says:" section, I had to add all of the below:
1
12
123
1234
12345
123456
(And map each one to my number-sequence entity)
... to handle all permutations of 1 through 6 long number sequences.
It's obviously a very hacky and ugly solution.
Is there a way I can define a number sequence type that will match any length of number sequence so that I can use it in phrase structures in a more flexible way?
In other words: I want to just define the placeholder match ONCE, and not have to redefine it for every length variation...
I don't see anything pertaining to "number sequence length" in the documentation of Dialog Flow types:
https://dialogflow.com/docs/reference/system-entities

The built-in entities like #sys.number-sequence will help you in finding numbers in sequence only. In api.ai, you can not specify the length of a number that you want either 4-digit or 6-digit. In addition, training agent for all permutations of any digit number is not a generalized solution.
What you can do is write a webhook, fetch this parameter #sys.number-sequence in your code & put a verification over there. If it is not of the number of digits you want, you can send a reply saying Please enter 6 digit number or say you have entered a correct code.

Related

RFM Segmentation Excel Take Mid Value

I am trying to segment Customer with RFM Segmentation ranged 1 to 5 for each column R, F and M. After I combined the three column, there are many possibilities such as 151, 555, or 254 and so on.
Code 555 is the best Customer
and X5X is the loyal customer. "X" defines any numbers, e.g Code 454 is also Loyal Customer segmentation.
The problem is i cannot exactly deliver the IF function in excel correctly. Here is my trial for 555
=IF(O14="555","Best Customer",IF(MID(O14,2,1)="5","Loyal Customer"))
The function overlaps since it took the latest IF, so the result for 555 is Loyal Customer which should be Best Customer. There are many segmentation such as XX5 for the big spenders, however since the formula turns to overlap i cannot continue the rest. Thank you for your help.
If you only want 1 result per number, then you just need to put the highest priority first. The first IF that is TRUE will be the one that is used. From your example, it looks like "455" would be both a Loyal Customer and a Big Spender. We can't tell from your explanation what the result should be in this case. But whichever is the higher priority should just come earlier in your nested IF statements.
Your formula looks correct that 555 should return "Best Customer". If it is returning Loyal Customer, then it seems like you've got 555 stored as a number rather than text in O14. If it is a number, your formula should instead be:
=IF(O14=555,"Best Customer",IF(MID(O14,2,1)="5","Loyal Customer"))
The only difference is removing the quotes around 555. If O14 is stored as a number, then O14="555" is comparing a number to a string (three "5" characters rather than the number 555), which will always return FALSE, hence it moves on the next IF statement. To get a TRUE result at the start, you need to compare O14 to the number 555 instead.
You may then be confused about why the 2nd part of the formula works. This is because the MID function will accept a number as input and then force a type conversion.
When you use the = operator, excel can only compare like values. Meaning it can compare strings to strings or numbers to numbers, but not strings to numbers.
However, the MID function will accept either strings or numbers. When it is given a number, it will first convert it to a string and then output a string.
If it is given MID(555,2,1), it first changes 555 to "555" and gives the same result as MID("555",2,1), which is the character "5" rather than the number 5.
So, even if O14 has the number 555, MID(014,2,1) will return the character "5" and the comparison MID(O14,2,1)="5" will return TRUE.

Find common subsets between "big" sets

So, I have a file that contains about 13000+ rows. Each row has a list of destinations separated by the char ";". I need to find between all those lists of destinations the 10 most common subsets (ignoring empty set or sets containing only 1 destination) between all the destinations, and the amount of times this subsets appear on the data:
An example may make this easier to understand:
This would be the file (each letter represents a destination)
A;B;C;D
A;B
A;B;C;D;E
A;B;C;D;E;F;G
A;B;C;D;E;F;G;H;L
C;G;B
K;H
So, the most common subsets of destinations together would be:
1. A;B : 5
2. A;C : 4
3. A;D : 4
4. A;B;C : 4
5. A;B;C;D : 4
6. A;E : 3
7. A;B;C;D;E : 3
8. B;C;D;E : 3
9. C;D;E : 3
10. A;B;C;D;E;F : 2
This problem seems very complex to me, I think it would be easier to solve it by limiting the size of the subsets to n (or a fixed number like 3).
Any ideas on how to solve it? I think I need something like FPGRowth but without the Association Rule generated.
Thanks!
you can solve this with one loop:
You have to generate a hashmap for saving the results...
you can give every destination a unique prime number and multiplicate the prime numbers of one line. the result is the key of the hashmap. if the key does not exist, you have to add it with a value of 1. If it exists, you can increase the value. This is called "Integer factorization". At the end you have to find the highest value number of your hashmap.
(hint: save the destination name also in the value of the hashmap,
then you do not have to recalculate the number to the destinations)
(2nd hint: remember the highest number and hashkey, so you don't have
to search at the end for the highest number and key...)
EDIT: for the combinations like A;B;C =>A;B and also B;C you can use 2 for loops to go through the line

Generate prefix to make 4 digits

I am creating a form that will generate sequential number for report types. Each number is sequential so first report is number 1 and second report is number 2 and so on.
The thing is, the report number needs to be in 4 digits, if the report number is not enough to make 4 digits, fill it in with 0's.
For example:
Report 1 number is 0001, report 2 number is 0002, report 10 number is 0010, report 100 number is 0100
I was thinking about adding 4 0's to the report number and do a substring formula, but the problem is I do not know the starting number.
Appreciate the help
I suggest to generate the report number once this is submitted by the user, this way you can get the sharepoint list id number, and then apply a simple set of rules like this:
Condition:
If Id does not match pattern Custom Pattern: \d{4}
Run this Actions
Set reportNumber's value: result = concat (substring("0000", 1 , 4 - string-length(Id))
Check the box to prevent the next rule to run when the condition is met
Add a new rule under this one just to print the Id if the condition is met.
So this way, numbers below 4 digits will be concatenated, and ids with 4 digits will just be displayed.
I hope this help you.

Deterministic automata to find number of subsequence in string of another string

Deterministic automata to find number of subsequences in string ?
How can I construct a DFA to find number of occurence string as a subsequence in another string?
eg. In "ssstttrrriiinnngggg" we have 3 subsequences which form string "string" ?
also both string to be found and to be searched only contain characters from specific character Set .
I have some idea about storing characters in stack poping them accordingly till we match , if dont match push again .
Please tell DFA solution ?
OVERLAPPING MATCHES
If you wish to count the number of overlapping sequences then you simply construct a DFA that matches the string, e.g.
1 -(if see s)-> 2 -(if see t)-> 3 -(if see r)-> 4 -(if see i)-> 5 -(if see n)-> 6 -(if see g)-> 7
and then compute the number of ways of being in each state after seeing each character using dynamic programming. See the answers to this question for more details.
DP[a][b] = number of ways of being in state b after seeing the first a characters
= DP[a-1][b] + DP[a-1][b-1] if character at position a is the one needed to take state b-1 to b
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=1.
Then the total number of overlapping strings is DP[len(string)][7]
NON-OVERLAPPING MATCHES
If you are counting the number of non-overlapping sequences, then if we assume that the characters in the pattern to be matched are distinct, we can use a slight modification:
DP[a][b] = number of strings being in state b after seeing the first a characters
= DP[a-1][b] + 1 if character at position a is the one needed to take state b-1 to b and DP[a-1][b-1]>0
= DP[a-1][b] - 1 if character at position a is the one needed to take state b to b+1 and DP[a-1][b]>0
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=infinity.
Then the total number of non-overlapping strings is DP[len(string)][7]
This approach will not necessarily give the correct answer if the pattern to be matched contains repeated characters (e.g. 'strings').

Map unique number to a unique string of 6 characters

I have a database table where every row has its unique ID (RowID).
Is there a good way to convert this RowID to a unique key that is always 6 characters in length. Unique key characters can be {A-Za-z0-9}. One example of unique key would be: a5Fg3A.
Of course I do realize there's only certain number of keys I can generate using this method but that doesn't matter for my case.
I've thought much about this but I can't come up with an algorithm that would be able to do this properly.
One idea I had was:
Unique key = RowID
If RowID is lower than 100000 then append 0 in front of it, for example:
123 becomes 000123
1 becomes 000001
Then for numbers in the range of 100000 to 900000 I would replace first number to a string, e.g. 0 = a, 1 = b, 2 = c, ..., 9 = j.
Then I could do the same with capital letter, etc.
My problem is that my algorithm is very limited and generates low number of keys because it wouldn't utilize all possible characters.
So basically I should be able to generate 56800235584 unique keys assuming every key is of length 6 and utilizes these characters: {A-Za-z0-9}.
A-Z = 26 characters
a-z = 26 characters
0-9 = 10 characters
So it's 62^6 unique keys.
Any feedback would be appreciated on how this could be done properly (or even optimal) :-)
Thanks!
You can sort your IDs, and then attach an increasing lexicographical string to each.
Simple example where your alphabet is only {a,b} (for simplicity only), and Ids= [20,1,7,90]:
sort: Ids = [1,7,20,90]
Attach increasing strings:
1 = aaaaaa
7 = aaaaab
20 = aaaaba
90 = 0000bb
If you want it as a hash function of some sort, and not data dependent - you can just use the same binary encoding that is used to the number, and convert it similary (i.e. 1 = aaaaaa, 2 = aaaaab, 3 = aaaaac...)
[Edit: basically the same as base-62 suggested by #HighPerformanceMark in comments]
The advantages of the first approach: allows you to deal with up to 62^6 numbers, regardless of that their size is, while the second approach does not allow it.
The second approach however, allows you a consistent conversion from number to string, regardless on the specific data.
If you want to make A-Z a-z 0-9 to be the alphabet as you noticed you have base 62 number system. So encode the unique rowid in base 62, there is a standard algorithm to do so. If your application allows (needs) it you can add a few more printable characters like '+', '/', '!', '#'.. so you get more uniques. The ready made answer is base64 encoding, widely used.
There are many ways to do this - the challenge is picking the one that's "best" for whatever your criteria are. Some examples, but far from exhaustive (some already suggested elsewhere):
pad with an increasing sequence
base-62 representation (note: base-64 is in common use and might even already have code available for it in whatever libraries you have at hand)
truncated cryptographic hash (slow, but has some other properties that might be useful, depending on exactly why you need to do this; if you only have to do it once, the performance hit may be worth it)
other not-necessarily-cryptographic hash functions that might be considerably faster
......

Resources