Multiple-digit numbers getting split by space in NuGram? - nlp
I'm seeing some unexpected behavior in the NuGram IDE Eclipse plug-in for ABNF grammar development.
Say I have a rule that reads:
$fifties =
50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59
;
The sentence generator comes up with the matches 5 0, 5 1, 5 2, ... I would normally expect 50, 51, 52, and so forth, but according to NuGram's coverage tool these are considered OOG.
Come to find that it will split any multiple-digit number with spaces, unless there's a leading non-number:
1234 -> 1 2 3 4
1234asdf -> 1 2 3 4 asdf
asdf1234 -> asdf1234
1234asdf5678 -> 1 2 3 4 asdf5678
As far as I know, a normal ABNF grammar wouldn't do this. Or am I forgetting something?
This is because NuGram IDE considers digits as individual DTMF tones. I agree that this behaviour should only apply to DTMF grammars and not voice grammars.
You can surround sequences of digits with double quotes, like:
$fifties =
"50" | "51" | "52" | "53" | "54" | "55" | "56" | "57" | "58" | "59"
;
Hope that helps!
Related
Separating lines with multiple values in one cell to individual lines in excel [duplicate]
This question already has answers here: Unnest (explode) a Pandas Series (8 answers) Closed 2 years ago. I have a data set (csv file) of names that list names with number of people with that name, their "rank" and the name itself. I am looking for a way to separate all the names into single lines ideally in excel - but maybe something in pandas is an option. The problem is that many of the lines contain multiple names comma separated. the data looks like this. rank | number of occurrences | name 1 | 10000 | marie 2 | 9999 | sophie 3 | 9998 | ellen ... ... 50 | 122 | jude, allan, jaspar I would like to have each name on an individual line alongside its correspondent number of occurrences. Its fine that the rank is duplicated. Something like this rank | number of occurrences | name 1 | 10000 | marie 2 | 9999 | sophie 3 | 9998 | ellen .. ... 50 | 122 | jude 50 | 122 | allan 50 | 122 | jaspar
Use df.explode() df.assign(name=(df.name.str.split(','))).explode('name') Way it works df.name=# Equivalent of df.assign(name= df.name.str.split(',')#puts the names in list df.explode('name')# Disintegrates the multiple names into one per row rank number of occurrences name 0 1 10000 marie 1 2 9999 sophie 2 3 9998 ellen 3 50 122 jude 3 50 122 allan 3 50 122 jaspar
In [60]: df Out[60]: rank no name 0 50 122 jude, allan, jaspar In [61]: df.assign(name=df['name'].str.split(',')).explode('name') Out[61]: rank no name 0 50 122 jude 0 50 122 allan 0 50 122 jaspar
Counting 15's in Cribbage Hand
Background This is a followup question to my previous finding a straight in a cribbage hand question and Counting Pairs in Cribbage Hand Objective Count the number of ways cards can be combined to a total of 15, then score 2 points for each pair. Ace worth 1, and J,Q,K are worth 10. What I have Tried So my first poke at a solution required 26 different formulas. Basically I checked each possible way to combine cards to see if the total was 15. 1 way to add 5 cards, 5 ways to add 4 cards, 10 ways to add 3 cards, and 10 ways to add 2 cards. I thought I had this licked until I realized I was only looking at combinations, I had not considered the fact that I had to cap the value of cards 11, 12, and 13 to 10. I initially tried an array formula something along the lines of: MIN(MOD(B1:F1-1,13)+1,10) But the problem with this is that MIN takes the minimum value of all results not the individual results compared to 10. I then tried it with an IF function, which worked, but involved the use of CSE formula even wehen being used with SUMPRODUCT which is something I try to avoid when I can IF(MOD(B1:F1-1,13)+1<11,MOD(B1:F1-1,13)+1,10) Then I stumble on an answer to a question in code golf which I modified to lead me to this formula, which I kind of like for some strange reason, but its a bit long in repetitive use: --MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2) My current working formulas are: 5 card check =(SUMPRODUCT(--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2))=15)*2 4 card checks =(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,2,3,4}))=15)*2 =(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,2,3,5}))=15)*2 =(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,2,4,5}))=15)*2 =(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,3,4,5}))=15)*2 =(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{2,3,4,5}))=15)*2 3 card checks same as 4 card checks using all combinations for 3 cards in the {1,2,3}. There are 10 different combinations, so 10 different formulas. The 2 card check was based on the solution by Tom in Counting Pairs in Cribbage Hand and all two cards are checked with a single formula. (yes it is CSE) 2 card check {=SUM(--(--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2)+TRANSPOSE(--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2))=15))} Question Can the 3 and 4 card combination sum check be brought into a single formula similar to the 2 card check? Is there a better way to convert cards 11,12,13 to a value of 10? Sample Data | B | C | D | E | F | POINTS +----+----+----+----+----+ | 1 | 2 | 3 | 17 | 31 | <= 2 (all 5 add to 15) | 1 | 2 | 3 | 17 | 32 | <= 2 (Last 4 add to 15) | 11 | 18 | 31 | 44 | 5 | <= 16 ( 4x(J+5), 4X(5+5+5) ) | 6 | 7 | 8 | 9 | 52 | <= 4 (6+9, 7+8) | 1 | 3 | 7 | 8 | 52 | <= 2 (7+8) | 2 | 3 | 7 | 9 | 52 | <= 2 (2+3+K) | 2 | 4 | 6 | 23 | 52 | <= 0 (nothing add to 15) Excel Version Excel 2013
For 5: =(SUMPRODUCT(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))=15)*2 For 4: =SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$5),{1,2,3,4;1,2,3,5;1,2,4,5;1,3,4,5;2,3,4,5}),ROW($1:$4)^0)=15))*2 For 3 =SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$10),{1,2,3;1,2,4;1,2,5;1,3,4;1,3,5;1,4,5;2,3,4;2,3,5;2,4,5;3,4,5}),ROW($1:$3)^0)=15))*2 For 2: SUMPRODUCT(--((CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))+(TRANSPOSE(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)))=15)) All together: =(SUMPRODUCT(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))=15)*2+ SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$5),{1,2,3,4;1,2,3,5;1,2,4,5;1,3,4,5;2,3,4,5}),ROW($1:$4)^0)=15))*2+ SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$10),{1,2,3;1,2,4;1,2,5;1,3,4;1,3,5;1,4,5;2,3,4;2,3,5;2,4,5;3,4,5}),ROW($1:$3)^0)=15))*2+ SUMPRODUCT(--((CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))+(TRANSPOSE(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)))=15)) For older versions we need to "trick" INDEX into accepting the arrays as Row and Column References: We do that by using N(IF({1},[thearray])) =(SUMPRODUCT(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))=15)*2+ SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,N(IF({1},ROW($1:$5))),N(IF({1},{1,2,3,4;1,2,3,5;1,2,4,5;1,3,4,5;2,3,4,5}))),ROW($1:$4)^0)=15))*2+ SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,N(IF({1},ROW($1:$10))),N(IF({1},{1,2,3;1,2,4;1,2,5;1,3,4;1,3,5;1,4,5;2,3,4;2,3,5;2,4,5;3,4,5}))),ROW($1:$3)^0)=15))*2+ SUMPRODUCT(--((CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))+(TRANSPOSE(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)))=15)) This is a CSE That must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
Adding zeros to a string without generating a new variable
I am trying to add zeros to a string variable in such a way that all levels of the variables have same number of digits (assume 3). clear input tina bina str4 pine 1 10 "99" 1 11 "99" 2 11 "99" 2 11 "99" 3 12 "." 4 12 "888" 5 14 "88" 6 15 "777" 7 16 "77" 8 17 "0" 8 18 "7" end I managed to do this by generating a new variable which stores the number of digits I need to add to each observation in order to reach 3: generate pi=3-strlen(pine) replace pine= ("0"*pi) + pine if strlen(pine)<3 I wonder if there is a way to obtain the same result but without generating the variable? I tried the following but it does not work : replace pine= ("0"*(`=3-strlen(pine)')) + pine if strlen(pine)<3 Probably I am not so clear about what happens when I evaluate expressions.
Your approach does not work because it evaluates the expression for the first observation only: . display `= 3 - strlen(pine)' 1 The single quotes are not required: replace pine = ("0" * (3-strlen(pine) ) ) + pine if strlen(pine) < 3 +--------------------+ | tina bina pine | |--------------------| 1. | 1 10 099 | 2. | 1 11 099 | 3. | 2 11 099 | 4. | 2 11 099 | 5. | 3 12 00. | |--------------------| 6. | 4 12 888 | 7. | 5 14 088 | 8. | 6 15 777 | 9. | 7 16 077 | 10. | 8 17 000 | |--------------------| 11. | 8 18 007 | +--------------------+
I know there is already an accepted answer, but I wanted to throw out my suggestion. This is maybe a little bit simpler than the other answer and is straightforward to explain. You just want to replace a string variable of real numbers with leading zeros and keep it as a string. You can easily do this by running: replace pine = string(real(pine),"%03.0f") Depending on your goal this is maybe better than the previous answer, because it maintains your missing value as missing and not add zeros to it. Hopefully this helpful.
Increment a number inside excel formula sidewise
I have following numbers in Column D1 to D21 1 | 4 | 51 | 4 | 57 | 6 | 16 | 11 | 41 | 3 | 26 | 3 | 27 | 5 | 3 | 5 | 8 | 6 | 22 | 6 | 23 I want to write a formula which will produce an output like this: 6:23 6:22 5:8 5:3 3:27 3:26 11:41 6:16 4:47 4:51 1 I have put the following formula on Column P1 and tried dragging on the left side but it is showing the same value in all rows as the value of 1 & 2 is not changing to 3 & 4: This is what i want: OFFSET($D$1,1,0)&":"&OFFSET($D$1,2,0) OFFSET($D$1,3,0)&":"&OFFSET($D$1,4,0) OFFSET($D$1,5,0)&":"&OFFSET($D$1,6,0) The middle value in the above formula should change.
I've tested with a smaller range of values and the following should work for you. You'll need to replace $A$1:$I$1 with your range. =IFERROR(OFFSET($A$1,0,COUNT($A$1:$I$1)-(COLUMN()*2))&":"&OFFSET($A$1,0,COUNT($A$1:$I$1)-((COLUMN()*2)-1)),IF((COUNT($A$1:$I$1)+1)=(COLUMN()*2),$A$1,"")) Also tested with your full range $A$1:$U$1: =IFERROR(OFFSET($A$1,0,COUNT($A$1:$U$1)-(COLUMN()*2))&":"&OFFSET($A$1,0,COUNT($A$1:$U$1)-((COLUMN()*2)-1)),IF((COUNT($A$1:$U$1)+1)=(COLUMN()*2),$A$1,""))
Converting string to numeric in Stata
I have survey data with the age of individuals in a variable named agen. Originally, the variable was string so I converted it to numeric using the encode command. When I tried to generate a new variable hhage referring to the age of head of household, the new variable generated was inconsistent. The commands I used are the following: encode agen, gen(age) gen hhage=age if relntohrp==1 The new variable generated is not consistent because when I browsed it: the age of the hh head in the first houshehold is 65 while the new number generated was 63. When I checked the second household, the variable hhage reported 28 instead of 33 as the head of the housheold head. And so on.
Run help encode and you can read: Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring. For example: clear all set more off input id str5 age 1 "32" 2 "14" 3 "65" 4 "54" 5 "98" end list encode age, gen(age2) destring age, gen(age3) list, nolabel Note the difference between using encode and destring. The former assigns numerical codes (1, 2, 3, ...) to the string values, while destring converts the string value to numeric. This you see stripping the value labels when you list: . list, nolabel +------------------------+ | id age age3 age2 | |------------------------| 1. | 1 32 32 2 | 2. | 2 14 14 1 | 3. | 3 65 65 4 | 4. | 4 54 54 3 | 5. | 5 98 98 5 | +------------------------+ A simple list or browse may confuse you because encode assigns the sequence of natural numbers but also assigns value labels equal to the original strings: . list +------------------------+ | id age age3 age2 | |------------------------| 1. | 1 32 32 32 | 2. | 2 14 14 14 | 3. | 3 65 65 65 | 4. | 4 54 54 54 | 5. | 5 98 98 98 | +------------------------+ The nolabel option shows the "underlying" data. You mention it is inconsistent, but for future questions posting exact input and results is more useful for those trying to help you.
Try taking a look at this method? Sounds like you may have slipped up somewhere in your method.