Exception: Invalid_argument "String.sub / Bytes.sub" - string

I wrote a tail recursive scanner for basic arithmetic expressions in OCaml
Syntax
Exp ::= n | Exp Op Exp | (Exp)
Op ::= + | - | * | /
type token =
| Tkn_NUM of int
| Tkn_OP of string
| Tkn_LPAR
| Tkn_RPAR
| Tkn_END
exception ParseError of string * string
let tail_tokenize s =
let rec tokenize_rec s pos lt =
if pos < 0 then lt
else
let c = String.sub s pos 1 in
match c with
| " " -> tokenize_rec s (pos-1) lt
| "(" -> tokenize_rec s (pos-1) (Tkn_LPAR::lt)
| ")" -> tokenize_rec s (pos-1) (Tkn_RPAR::lt)
| "+" | "-" | "*" | "/" -> tokenize_rec s (pos-1) ((Tkn_OP c)::lt)
| "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ->
(match lt with
| (Tkn_NUM n)::lt' ->
(let lta = Tkn_NUM(int_of_string (c^(string_of_int n)))::lt' in
tokenize_rec s (pos-1) lta)
| _ -> tokenize_rec s (pos-1) (Tkn_NUM (int_of_string c)::lt)
)
|_ -> raise (ParseError ("Tokenizer","unknown symbol: "^c))
in
tokenize_rec s (String.length s) [Tkn_END]
During execution I get
tail_tokenize "3+4";;
Exception: Invalid_argument "String.sub / Bytes.sub".

Your example case is this:
tail_tokenize "3+4"
The first call will look like this:
tokenize_rec "3+4" 3 Tkn_END
Since 3 is not less than 0, the first call inside tokenize_rec will look like this:
String.sub "3+4" 3 1
If you try this yourself you'll see that it's invalid:
# String.sub "3+4" 3 1;;
Exception: Invalid_argument "String.sub / Bytes.sub".
It seems a little strange to work through the string backwards, but to do this you need to start at String.length s - 1.

From the error message it's clear that String.sub is the problem. Its arguments are s, pos and 1 with the last being a constant and the two others coming straight from the function arguments. It might be a good idea to run this in isolation with the arguments substituted for the actual values:
let s = "3+4" in
String.sub s (String.length s) 1
Doing so we again get the same error, and hopefully it's now clear why: You're trying to get a substring of length 1 from the last character, meaning it will try to go past the end of the string, which of course it can't.
Logically, you might try to subtract 1 from pos then, so that it takes a substring of length 1 starting from before the last character. But again you get the same error. That is because your terminating condition is pos < 0, which means you'll try to run String sub s (0 - 1) 1. Therefore you need to adjust the terminating condition too. But once you've done that you should be good!

Related

Stata test if string contains same character

I want to automatically test if the string contains only one type of character, with the result in a true/false variable "check"
input str11 contactno
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end
my attempt
gen check = .
//loop through dataset
local db =_N
forval x = 1/`db'{
dis as error "obs `x'"
//get first character in string
local f = substr(contactno, 1, 1) in `x'
//loop through each character in string
capture drop check_*
forvalues i = 1/11 {
quietly gen check_`i'=.
local j = substr(contactno, `i', 1) in `x'
//Tag characters that match
if "`j'" == "`f'" {
local y = 1
replace check_`i'= 1 in `x'
}
else {
local y= 0
replace check_`i'= 0 in `x'
}
}
Expected results the first two observations should be true and the third false.
You can achieve this in one line of code as follows:
Take the first character of contactno.
Find all instances of this character in contactno and replace with an empty string (i.e., "").
Test whether the resulting string is empty.
gen check = missing(subinstr(contactno,substr(contactno,1,1),"",.))
+---------------------+
| contactno check |
|---------------------|
1. | aaaaaaaaaaa 1 |
2. | bbbbbbbbbbb 1 |
3. | aaaaaaaaaab 0 |
+---------------------+
So we are leveraging the fact that if all characters are not equal to the first character, then the string cannot contain only one (type of) character.
Here's another way to do it.
clear
input str11 contactno
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end
gen long id = _n
save original_data, replace
expand 11
bysort id : gen character = substr(contactno, _n, 1)
bysort id (character) : gen byte OK = character[1] == character[_N]
drop character
bysort id : keep if _n == 1
merge 1:1 id using original_data
list
+-------------------------------------+
| contactno id OK _merge |
|-------------------------------------|
1. | aaaaaaaaaaa 1 1 Matched (3) |
2. | bbbbbbbbbbb 2 1 Matched (3) |
3. | aaaaaaaaaab 3 0 Matched (3) |
+-------------------------------------+

Get position of first and last char in substring

I would like to make a function in OCaml that return the position of the first and last char in a substring. For example my_sub "tar" "ar" will return (1,2) but if I have my_sub "tabr" "ar" it will be Nil, it must be consecutive. How can I do that ?
Edit
I tried to make the code but I have a problem
let rec pos_sub l t n =
let rec aux l1 l2 x =
match l1, l2 with
| [], _ | _, [] | [], [] -> -1
| h1::q1, h2 | h1, h2 -> if h1 = h2 then x else -1
| h1::q1, h2::q2 -> if h1 = h2 then aux q1 q2 x+1 else -1
in
match l, t with
| [], _ -> (-1,-1)
| h1::q1, h2::q2 -> if h1 = h2 then (n, (aux q1 q2 n+1)) else pos_sub q1 t n+1
it says :
The variable h1 on the left-hand side of this or-pattern has type 'a
but on the right-hand side it has type 'a list
The type variable 'a occurs inside 'a list
in the second match in aux
Your problem in the code is, that in this match:
| h1::q1, h2 | h1, h2 -> if h1 = h2 then x else -1
you try to compare a single character h1 with h2 which is of type string. This is what the error message tries to tell you. I think you intended match the case, where h2 is the last character of your search string, therefore:
| h1::q1, h2:[] | h1:[], h2:[] -> if h1 = h2 then x else -1
and because q1 is unused, this can then be simplified to:
| h1::_, h2:[] -> if h1 = h2 then x else -1
A sidenode: it is bad style to use -1 or similar as special values to signal error cases. Rather use optional types in such situations.

Creating a string from the permutations of an ArrayBuffer[String] elements in Scala

I have
val a: String = "E"
val y: ArrayBuffer[String] = new ArrayBuffer("I", "G", "S")
I am trying to make a string, such that:
"(E <=> (I | G | S)) & (~I | ~G) & (~I | ~S) & (~G | ~S)"
Currently, for the first part of string (first clause) (E <=> (I | G | S)), I have this which is functional:
s"($a <=> (${y.mkString(" | ")}))" // & (~${y.mkString(" | ~")})"
For the second part, where are the permutations of elements in y, i.e., for (~I | ~G) & (~I | ~S) & (~G | ~S), how I can improve (fix) the part within comments to create it?
I am trying to use the y.permutations, to create another string and then to concatenate with this one, but can it be "generated" here - within the same string in some way?
Thanks.
It seems from your example that what you need is combinations, not permutations.
So to have a term for every pair of elements from y you can find all combinations of length 2 using combinations method. Then you can wrap each pair in brackets in the necessary format, and finally build the whole second part with mkString:
y.combinations(2).map { case Seq(a, b) => s"(~$a | ~$b)" }.mkString(" & ")
You can integrate this expression into the string interpolation:
s"($a <=> (${y.mkString(" | ")})) & ${
y.combinations(2).map { case Seq(a, b) => s"(~$a | ~$b)" }.mkString(" & ")}"

F# Converting Currency from Excel

I'm trying to convert a column in an Excel sheet to a float in my F# application. The problem is that I do not know in which format the currency is supplied. This can be manually typed, with or without a symbol and of course the . and , symbols are always a mess.
Is there any "short and sweet" way of warsing a what appears to be incohesive array of possibilities to an actual floating point value which later after some arithmetic can be printed as currency?
A side problem I've encountered:
When a column in Excel is marked as Number 600.00 will be exported through the interop libraries as 600; 534.20 will be exported as 534.2
A simple parse on the . symbol is not enough.
The symbol which is not shown is Excel will be exported through the interop libraries as a ? (with a space following).
These options do not wor:
let ParseFloat1 (o:obj) =
float (o.ToString())
let parseFloat2 (o:obj) =
float (System.Single.Parse(o.ToString()))
After these attempts I just went crazy and started russamafuzzin' solutions, not even this dragon of a bad idea worked:
let ParseFloat o =
// ugly
let mutable _string = o.ToString()
// because of the weird "lets leave trailing zero's off behavior
let changeString (s:string) =
match s.LastIndexOf "." with
| 0 | -1 -> s + "00"
| 1 -> s + "0"
| _ -> s
_string <- changeString _string
let characters = _string.ToCharArray()
// remove all the non numbers from the string
let rec parse source dest =
match source with
| h::t ->
match h with
| '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0' -> parse t (dest + h.ToString())
| _ -> parse t dest
| _ -> dest
let _float = parse (Array.toList characters) ""
let result = (float (System.Single.Parse(_float))) / (float 100)
result
I really hope someone can help me, because this is driving me crazy. Thank you in advance.
EDIT (16-11-2015):
More information after the valid comments, I appreciate all the help and comments.
I have broken the issue down into more "parts" so I've introduced a few conventions for this application. I figured that there is no solution for the problem so I needed to put in some restrictions and hope for the best...
I get the decimal symbol from CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator and make sure the column in the excel sheet is of type Number, adding a column of this type and resaving if needed.
I remove all the other symbols from the string, leaving only the separator in place. (Just like the answer Petr)
Running Excel in same context and app making sure the CultureInfo is the same.
To expand on Petr's answer:
let ParseFloat o =
let decimalSeparator = Convert.ToChar(CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator);
let newv = String((o.ToString()) |> Seq.filter (fun c -> Char.IsNumber c || c = decimalSeparator) |> Array.ofSeq )
let rslt = match Double.TryParse(newv) with
| true, number -> (float number)
| false, _ -> throw "Cannot parse the number"
rslt
Something like this?
open System
let value = "£1,097."
let newv = String(value |> Seq.filter (fun c -> Char.IsNumber c || c = '.') |> Array.ofSeq )
let rslt = match Double.TryParse(newv) with
| true, number -> printfn"Converted '%s' to %.2f" value number
| false, _ -> printfn "Unable to convert '%s'" value
Result:
Converted '£1,097.' to 1097.00

String Matching with newline character in Haskell

Here I am trying to find the index of '-' followed by '}' in a String.
For an input like sustringIndex "abcd -} sad" it gives me an output of 10
which is giving me the entire string length.
Also if I do something like sustringIndex "abcd\n -} sad" it gives me 6
Why is that so with \n. What am I doing wrong. Please correct me I'm a noob.
substrIndex :: String -> Int
substrIndex ""=0
substrIndex (s:"") = 0
substrIndex (s:t:str)
| s== '-' && t == '}' = 0
| otherwise = 2+(substrIndex str)
Your program has a bug. You are checking every two characters. But, what if the - and } are in different pairs, for example S-}?
It will first check S and - are equal to - and } respectively.
Since they don't match, it will move on with } alone.
So, you just need to change the logic a little bit, like this
substrIndex (s:t:str)
| s == '-' && t == '}' = 0
| otherwise = 1 + (substrIndex (t:str))
Now, if the current pair doesn't match -}, then just skip the first character and proceed with the second character, substrIndex (t:str). So, if S- doesn't match, your program will proceed with -}. Since we dropped only one character we add only 1, instead of 2.
This can be shortened and written clearly, as suggested by user2407038, like this
substrIndex :: String -> Int
substrIndex [] = 0
substrIndex ('-':'}':_) = 0
substrIndex (_:xs) = 1 + substrIndex xs

Resources