How do I strip whitespace from a string in OCaml? - string

To learn the basics of OCaml, I'm solving one of the easy facebook engineering puzzles using it. Essentially, I'd like to do something like the following Python code:
some_str = some_str.strip()
That is, I'd like to strip all of the whitespace from the beginning and the end. I don't see anything obvious to do this in the OCaml Str library. Is there any easy way to do this, or am I going to have to write some code to do it (which I wouldn't mind, but would prefer not to :) ).
Bear in mind that I'm limited to what's in the libraries that come with the OCaml distribution.

I know this question is uber-old, but I was just pondering the same thing and came-up with this (from toplevel):
let strip str =
let str = Str.replace_first (Str.regexp "^ +") "" str in
Str.replace_first (Str.regexp " +$") "" str;;
val strip : string -> string = <fun>
then
strip " Hello, world! ";;
- : string = "Hello, world!"
UPDATE:
As of 4.00.0, standard library includes String.trim

It is really a mistake to limit yourself to the standard library, since the standard ilbrary is missing a lot of things. If, for example, you were to use Core, you could simply do:
open Core.Std
let x = String.strip " foobar "
let () = assert (x = "foobar")
You can of course look at the sources of Core if you want to see the implementation. There is a similar function in ExtLib.

how about
let trim str =
if str = "" then "" else
let search_pos init p next =
let rec search i =
if p i then raise(Failure "empty") else
match str.[i] with
| ' ' | '\n' | '\r' | '\t' -> search (next i)
| _ -> i
in
search init
in
let len = String.length str in
try
let left = search_pos 0 (fun i -> i >= len) (succ)
and right = search_pos (len - 1) (fun i -> i < 0) (pred)
in
String.sub str left (right - left + 1)
with
| Failure "empty" -> ""
(Via Code Codex)

I believe at the point when the other answers were given, version 4.00 was not out yet. Actually, in OCaml 4.00, there is a String.trim function in the string module to trim leading and trailing white spaces.
Alternatively, if you're restricted to an older version of OCaml, you may use this function that is shamelessly copied from the source of 4.00's string module.
let trim s =
let is_space = function
| ' ' | '\012' | '\n' | '\r' | '\t' -> true
| _ -> false in
let len = String.length s in
let i = ref 0 in
while !i < len && is_space (String.get s !i) do
incr i
done;
let j = ref (len - 1) in
while !j >= !i && is_space (String.get s !j) do
decr j
done;
if !i = 0 && !j = len - 1 then
s
else if !j >= !i then
String.sub s !i (!j - !i + 1)
else
""
;;

Something simple like this should work fine:
#require "str";;
let strip_string s =
Str.global_replace (Str.regexp "[\r\n\t ]") "" s

Standard library's
String.trim
does exactly that.

Related

How find integer in text

Help me figure out how to work with text
i have a string like: "word1 number: word2" for example : "result 0: Good" or "result 299: Bad"
i need print Undefined/Low or High
When string is null , print Undefined
When number 0-15, print Low
When number >15, print High
type GetResponse =
{
MyData: string voption
ErrorMessage: string voption }
val result: Result<GetResponse, MyError>
and then i try:
MyData =
match result with
| Ok value ->
if (value.Messages = null) then
ValueSome "result: Undefined"
else
let result =
value.Messages.FirstOrDefault(
(fun x -> x.ToUpperInvariant().Contains("result")),
"Undefined"
)
if (result <> "Undefined") then
ValueSome result
else
errors.Add("We don't have any result")
ValueNone
| Error err ->
errors.Add(err.ToErrorString)
ValueNone
ErrorMessage =
if errors.Any() then
(errors |> String.concat ", " |> ValueSome)
else
ValueNone
but i dont know gow check in string number and maybe there is some way print this without a billion if?
Parsing gets complex very quickly. I recommend using FParsec to simplify the logic and avoid errors. A basic parser that seems to meet your needs:
open System
open FParsec
let parseWord =
manySatisfy Char.IsLetter
let parseValue =
parseWord // parse any word (e.g. "result")
>>. spaces1 // skip whitespace
>>. puint32 // parse an unsigned integer value
.>> skipChar ':' // skip colon character
.>> spaces // skip whitespace
.>> parseWord // parse any word (e.g. "Good")
You can then use it like this:
type ParserResult = Undefined | Low | High
let parse str =
if isNull str then Result.Ok Undefined
else
match run parseValue str with
| Success (num, _ , _) ->
if num <= 15u then Result.Ok Low
else Result.Ok High
| Failure (errorMsg, _, _) ->
Result.Error errorMsg
parse null |> printfn "%A" // Ok Undefined
parse "result 0: Good" |> printfn "%A" // Ok Low
parse "result 299: Bad" |> printfn "%A" // Ok High
parse "invalid input" |> printfn "%A" // Error "Error in Ln: 1 Col: 9 ... Expecting: integer number"
There's definitely a learning curve with FParsec, but I think it's worth adding to your toolbelt.
I agree with Brian that parsing can become quite tricky very quickly. However if you have some well established format of the input and you're not very much into writing complex parsers, good old regular expressions can be of service ;)
Here is my take on the problem - please note that it has plenty of room to improve, this is just a proof of concept:
open System.Text.RegularExpressions
let test1 = "result 0: Good"
let test2 = "result 299: Bad"
let test3 = "some other text"
type ParserResult =
| Undefined
| Low of int
| High of int
let (|ValidNumber|_|) s =
//https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex?view=net-6.0
let rx = new Regex("(\w\s+)(\d+)\:(\s+\w)")
let matches = rx.Matches(s)
if matches.Count > 0 then
let groups = matches.[0].Groups |> Seq.toList
match groups with
| [_; _; a; _] -> Some (int a.Value)
| _ -> None
else
None
let parseMyString str =
match str with
| ValidNumber n when n < 16 -> Low n
| ValidNumber n -> High n
| _ -> Undefined
//let r = parseMyString test1
printfn "%A" (parseMyString test1)
printfn "%A" (parseMyString test2)
printfn "%A" (parseMyString test3)
The active pattern ValidNumber returns the Some number if a match of the input string is found, otherwise it returns None. The parseMyString function uses the pattern and guards to initialise the final ParserOutput value.

reading integers from a string

I want to read a line from a file, initialize an array from that line and then display the integers.
Why is is not reading the five integers in the line? I want to get output 1 2 3 4 5, i have 1 1 1 1 1
open Array;;
open Scanf;;
let print_ints file_name =
let file = open_in file_name in
let s = input_line(file) in
let n = ref 5 in
let arr = Array.init !n (fun i -> if i < !n then sscanf s "%d" (fun a -> a) else 0) in
let i = ref 0 in
while !i < !n do
print_int (Array.get arr !i);
print_string " ";
i := !i + 1;
done;;
print_ints "string_ints.txt";;
My file is just: 1 2 3 4 5
You might want to try the following approach. Split your string into a list of substrings representing numbers. This answer describes one way of doing so. Then use the resulting function in your print_ints function.
let ints_of_string s =
List.map int_of_string (Str.split (Str.regexp " +") s)
let print_ints file_name =
let file = open_in file_name in
let s = input_line file in
let ints = ints_of_string s in
List.iter (fun i -> print_int i; print_char ' ') ints;
close_in file
let _ = print_ints "string_ints.txt"
When compiling, pass str.cma or str.cmxa as an argument (see this answer for details on compilation):
$ ocamlc str.cma print_ints.ml
Another alternative would be using the Scanf.bscanf function -- this question, contains an example (use with caution).
The Scanf.sscanf function may not be particularly suitable for this task.
An excerpt from the OCaml manual:
the scanf facility is not intended for heavy duty lexical analysis and parsing. If it appears not expressive enough for your needs, several alternative exists: regular expressions (module Str), stream parsers, ocamllex-generated lexers, ocamlyacc-generated parsers
There is though a way to parse a string of ints using Scanf.sscanf (which I wouldn't recommend):
let rec int_list_of_string s =
try
Scanf.sscanf s
"%d %[0-9-+ ]"
(fun n rest_str -> n :: int_list_of_string rest_str)
with
| End_of_file | Scanf.Scan_failure _ -> []
The trick here is to represent the input string s as a part which is going to be parsed into a an integer (%d) and the rest of the string using the range format: %[0-9-+ ]", which will match the rest of the string, containing only decimal digits 0-9, the - and + signs, and whitespace .

Lua: split string into words unless quoted

So I have the following code to split a string between whitespaces:
text = "I am 'the text'"
for string in text:gmatch("%S+") do
print(string)
end
The result:
I
am
'the
text'
But I need to do this:
I
am
the text --[[yep, without the quotes]]
How can I do this?
Edit: just to complement the question, the idea is to pass parameters from a program to another program. Here is the pull request that I am working, currently in review: https://github.com/mpv-player/mpv/pull/1619
There may be ways to do this with clever parsing, but an alternative way may be to keep track of a simple state and merge fragments based on detection of quoted fragments. Something like this may work:
local text = [[I "am" 'the text' and "some more text with '" and "escaped \" text"]]
local spat, epat, buf, quoted = [=[^(['"])]=], [=[(['"])$]=]
for str in text:gmatch("%S+") do
local squoted = str:match(spat)
local equoted = str:match(epat)
local escaped = str:match([=[(\*)['"]$]=])
if squoted and not quoted and not equoted then
buf, quoted = str, squoted
elseif buf and equoted == quoted and #escaped % 2 == 0 then
str, buf, quoted = buf .. ' ' .. str, nil, nil
elseif buf then
buf = buf .. ' ' .. str
end
if not buf then print((str:gsub(spat,""):gsub(epat,""))) end
end
if buf then print("Missing matching quote for "..buf) end
This will print:
I
am
the text
and
some more text with '
and
escaped \" text
Updated to handle mixed and escaped quotes. Updated to remove quotes. Updated to handle quoted words.
Try this:
text = [[I am 'the text' and '' here is "another text in quotes" and this is the end]]
local e = 0
while true do
local b = e+1
b = text:find("%S",b)
if b==nil then break end
if text:sub(b,b)=="'" then
e = text:find("'",b+1)
b = b+1
elseif text:sub(b,b)=='"' then
e = text:find('"',b+1)
b = b+1
else
e = text:find("%s",b+1)
end
if e==nil then e=#text+1 end
print("["..text:sub(b,e-1).."]")
end
Lua Patterns aren't powerful to handle this task properly. Here is an LPeg solution adapted from the Lua Lexer. It handles both single and double quotes.
local lpeg = require 'lpeg'
local P, S, C, Cc, Ct = lpeg.P, lpeg.S, lpeg.C, lpeg.Cc, lpeg.Ct
local function token(id, patt) return Ct(Cc(id) * C(patt)) end
local singleq = P "'" * ((1 - S "'\r\n\f\\") + (P '\\' * 1)) ^ 0 * "'"
local doubleq = P '"' * ((1 - S '"\r\n\f\\') + (P '\\' * 1)) ^ 0 * '"'
local white = token('whitespace', S('\r\n\f\t ')^1)
local word = token('word', (1 - S("' \r\n\f\t\""))^1)
local string = token('string', singleq + doubleq)
local tokens = Ct((string + white + word) ^ 0)
input = [["This is a string" 'another string' these are words]]
for _, tok in ipairs(lpeg.match(tokens, input)) do
if tok[1] ~= "whitespace" then
if tok[1] == "string" then
print(tok[2]:sub(2,-2)) -- cut off quotes
else
print(tok[2])
end
end
end
Output:
This is a string
another string
these
are
words

OCAML - strings and substrings

Could someone help me to write a function that checks if a string is a substring of another string?
(there can be more than only 2 strings)
Thanks
With String module:
let contains s1 s2 =
try
let len = String.length s2 in
for i = 0 to String.length s1 - len do
if String.sub s1 i len = s2 then raise Exit
done;
false
with Exit -> true
With Str module, like #barti_ddu said check this topic:
let contains s1 s2 =
let re = Str.regexp_string s2 in
try
ignore (Str.search_forward re s1 0);
true
with Not_found -> false
With Batteries, you can use String.exists. It also exists in ExtLib: String.exists.
A String-based alternative to cago's answer that might have better performance and lower memory usage:
let is_substring string substring =
let ssl = String.length substring and sl = String.length string in
if ssl = 0 || ssl > sl then false else
let max = sl - ssl and clone = String.create ssl in
let rec check pos =
pos <= max && (
String.blit string pos clone 0 ssl ; clone = substring
|| check (String.index_from string (succ pos) substring.[0])
)
in
try check (String.index string substring.[0])
with Not_found -> false
String str="hello world";
System.out.println(str.contains("world"));//true
System.out.println(str.contains("world1"));//false

better way to get the last character in a string in f#

I want the last character from a string
I've got str.[str.Length - 1], but that's ugly. There must be a better way.
There's no better way to do it - what you have is fine.
If you really plan to do it a lot, you can author an F# extension property on the string type:
let s = "food"
type System.String with
member this.Last =
this.Chars(this.Length-1) // may raise an exception
printfn "%c" s.Last
This could be also handy:
let s = "I am string"
let lastChar = s |> Seq.last
Result:
val lastChar : char = 'g'
(This is old question), someone might find this useful, orig answer from Brian.
type System.String with
member this.Last() =
if this.Length > 1 then
this.Chars(this.Length - 1).ToString()
else
this.[0].ToString()
member this.Last(n:int) =
let absn = Math.Abs(n)
if this.Length > absn then
let nn =
let a = if absn = 0 then 1 else absn
let b = this.Length - a
if b < 0 then 0 else b
this.Chars(nn).ToString()
else
this.[0].ToString()
"ABCD".Last() -> "D"
"ABCD".Last(1) -> "D"
"ABCD".Last(-1) -> "D"
"ABCD".Last(2) -> "C"
You could also treat it as a sequence, but I'm not sure if that's any more or less ugly than the solution you have:
Seq.nth (Seq.length str - 1) str

Resources