How can I extract the middle part of a string in FSharp? - string

I want to extract the middle part of a string using FSharp if it is quoted, similar like this:
let middle =
match original with
| "\"" + mid + "\"" -> mid
| all -> all
But it doesn't work because of the infix operator + in pattern expression. How can I extract this?

I don't think there is any direct support for this, but you can certainly write an active pattern. Active patterns allow you to implement your own code that will run as part of the pattern matching and you can extract & return some part of the value.
The following is a pattern that takes two parameters (prefix and postfix string) and succeeds if the given input starts/ends with the specified strings. The pattern is not complete (can fail), so we'll use the |Name|_| syntax and it will need to return option value:
let (|Middle|_|) prefix postfix (input:string) =
// Check if the string starts with 'prefix', ends with 'postfix' and
// is longer than the two (meaning that it contains some middle part)
if input.StartsWith(prefix) && input.EndsWith(postfix) &&
input.Length >= (prefix.Length + postfix.Length) then
// Strip the prefix/postfix and return 'Some' to indicate success
let len = input.Length - prefix.Length - postfix.Length
Some(input.Substring(prefix.Length, len))
else None // Return 'None' - string doesn't match the pattern
Now we can use Middle in pattern matching (e.g. when using match):
match "[aaa]" with
| Middle "[" "]" mid -> mid
| all -> all

Parameterized active patterns to the rescue!
let (|HasPrefixSuffix|_|) (pre:string, suf:string) (s:string) =
if s.StartsWith(pre) then
let rest = s.Substring(pre.Length)
if rest.EndsWith(suf) then
Some(rest.Substring(0, rest.Length - suf.Length))
else
None
else
None
let Test s =
match s with
| HasPrefixSuffix("\"","\"") inside ->
printfn "quoted, inside is: %s" inside
| _ -> printfn "not quoted: %s" s
Test "\"Wow!\""
Test "boring"

… or just use plain old regular expression
let Middle input =
let capture = Regex.Match(input, "\"([^\"]+)\"")
match capture.Groups.Count with
| 2 -> capture.Groups.[1].Value
| _ -> input

Patterns have a limited grammar - you can't just use any expression. In this case, I'd just use an if/then/else:
let middle (s:string) =
if s.[0] = '"' && s.[s.Length - 1] = '"' && s.Length >= 2 then
s.Substring(1,s.Length - 2)
else s
If grabbing the middle of a string with statically known beginnings and endings is something that you'll do a lot, then you can always use an active pattern as Tomas suggests.

Not sure how efficient this is:
let GetQuote (s:String) (q:char) =
s
|> Seq.skip ((s |> Seq.findIndex (fun c -> c = q))+1)
|> Seq.takeWhile (fun c-> c <> q)
|> Seq.fold(fun acc c -> String.Format("{0}{1}", acc, c)) ""
Or there's this with Substring in place of the fold:
let GetQuote2 (s:String) (q:char) =
let isQuote = (fun c -> c = q)
let a = (s |> Seq.findIndex isQuote)+1
let b = ((s |> Seq.take(a) |> Seq.findIndex isQuote)-1)
s.Substring(a,b);
These will get the first instance of the quoted text anywhere in the string e.g "Hello [World]" -> "World"

Related

How convert first char to lowerCase

Try to play with string and I have string like: "Hello.Word" or "stackOver.Flow"
and i what first char convert to lower case: "hello.word" and "stackOver.flow"
For snakeCase it easy we need only change UpperCase to lower and add '_'
but in camelCase (with firs char in lower case) i dont know how to do this
open System
let convertToSnakeCase (value:string) =
String [|
Char.ToLower value.[0]
for ch in value.[1..] do
if Char.IsUpper ch then '_'
Char.ToLower ch |]
Who can help?
module Identifier =
open System
let changeCase (str : string) =
if String.IsNullOrEmpty(str) then str
else
let isUpper = Char.IsUpper
let n = str.Length
let builder = new System.Text.StringBuilder()
let append (s:string) = builder.Append(s) |> ignore
let rec loop i j =
let k =
if i = n (isUpper str.[i] && (not (isUpper str.[i - 1])
((i + 1) <> n && not (isUpper str.[i + 1]))))
then
if j = 0 then
append (str.Substring(j, i - j).ToLower())
elif (i - j) > 2 then
append (str.Substring(j, 1))
append (str.Substring(j + 1, i - j - 1).ToLower())
else
append (str.Substring(j, i - j))
i
else
j
if i = n then builder.ToString()
else loop (i + 1) k
loop 1 0
type System.String with
member x.ToCamelCase() = changeCase x
printfn "%s" ("StackOver.Flow".ToCamelCase()) //stackOver.Flow
//need stackOver.flow
I suspect there are much more elegant and concise solutions, I sense you are learning functional programming, so I think its best to do stuff like this with recursive function rather than use some magic library function. I notice in your question you ARE using a recusive function, but also an index into an array, lists and recursive function work much more easily than arrays, so if you use recursion the solution is usually simpler if its a list.
I'd also avoid using a string builder, assuming you are learning fp, string builders are imperative, and whilst they obviously work, they wont help you get your head around using immutable data.
The key then is to use the pattern match to match the scenario that you want to use to trigger the upper/lower case logic, as it depends on 2 consecutive characters.
I THINK you want this to happen for the 1st char, and after a '.'?
(I've inserted a '.' as the 1st char to allow the recursive function to just process the '.' scenario, rather than making a special case).
let convertToCamelCase (value : string) =
let rec convertListToCamelCase (value : char list) =
match value with
| [] -> []
| '.' :: second :: rest ->
'.' :: convertListToCamelCase (Char.ToLower second :: rest)
| c :: rest ->
c :: convertListToCamelCase rest
// put a '.' on the front to simplify the logic (and take it off after)
let convertAsList = convertListToCamelCase ('.' :: (value.ToCharArray() |> Array.toList))
String ((convertAsList |> List.toArray).[1..])
The piece to worry about is the recusive piece, the rest of it is just flipping an array to a list and back again.

How to read a list of words into tuples that include word positions?

I have a textual file and I would like to write a function that reads this file and returns a list of tuples, where each tuple will consist of the word as string, the word line number as int, and the position of the last character of the word as int. Sample input,
example of the first line
followed by the second line
Sample output:
[
("example",1,8);
("of",1,11);
("the",1,15);
("first",1,21);
("line",1,26);
("followed",2,13);
("by",2,16);
("the",2,20);
("second",2,27);
("line",2,32)
]
The function that you are looking looks something like this,
let read filename =
In_channel.read_lines filename |>
List.mapi ~f:(fun line data ->
String.split data ~on:' ' |>
List.fold_map ~init:0 ~f:(fun pos word ->
let pos = pos + String.length word in
pos+1, (word,line+1,pos-1)) |>
snd) |>
List.concat
Here is how to use it. First install the dependencies,
opam install dune stdio merlin
Next, setup your project,
dune init exe readlines --libs=base,stdio
Then open readlines.ml in your favorite editor and substitute its contents with the following,
open Base
open Stdio
let read filename =
In_channel.read_lines filename |>
List.mapi ~f:(fun line data ->
String.split data ~on:' ' |>
List.fold_map ~init:0 ~f:(fun pos word ->
let pos = pos + String.length word in
pos+1, (word,line+1,pos-1)) |>
snd) |>
List.concat
let print =
List.iter ~f:(fun (line,data,pos) ->
printf "(%s,%d,%d)\n" line data pos)
let main filename =
print (read filename)
let () = match Sys.get_argv () with
| [|_; filename|] -> main filename
| _ -> failwith "expects one argument: filename"
To run and test, create a sample input, e.g. a file named test.txt
example of the first line
followed by the second line
(make sure that the last line is followed by a newline)
Now you can run it,
dune exec ./readlines.exe test.txt
The result should be the following,
(example,1,6)
(of,1,9)
(the,1,13)
(first,1,19)
(line,1,24)
(followed,2,7)
(by,2,10)
(the,2,14)
(second,2,21)
(line,2,26)
(Notice, that I am counting positions from 0 not from 1).
You can also run this code interactively in utop, but you would need to install base and stdio and load them into the interpreter, with
#require "base";;
#require "stdio";;
If you're not using utop but the default OCaml toplevel, you need to also install ocamlfind (opam install ocamlfind) and do
#use "topfind";;
#require "base";;
#require "stdio";;
If you want to just use the standard libraries as String you can do what you want with String.split_on_char and some other stuff applied on each line.
Here is an example on how you could do for the first lien
let ic = open_in (*your file name*) in
let first_line = input_line ic in
let words = String.split_on_char ' ' first_line in
let rec aux accLen =
function
| [] -> []
| s :: ts ->
match s with
(* empty string means that their were a white space before the split *)
| "" -> aux (accLen +1) ts
| s -> let l = accLen + String.length s in (1, s, l) :: aux l ts
in aux 0 words;;
As ivg said, you can replace the aux function with a List.fold_left :
let ic = open_in (*your file name*) in
let first_line = input_line ic in
let words = String.split_on_char ' ' first_line in
let _, l = List.fold_left (
fun (accLen, accRes) ->
function
| "" -> (accLen+1, accRes)
| s -> let l = accLen + String.length s in (l, (1, s, l) :: accRes)
) (0, []) words
in List.rev l;;
Doesn't include the file I/O component, but does properly handle multiple spaces between words, including tabs. Some fun use of fold_left to entertain a new OCaml programmer.
let words_with_last_index line =
line ^ " "
|> String.to_seqi
|> Seq.fold_left
(fun (wspace, cur_word, words) (cur_pos, cur_ch) ->
match cur_ch with
| ' ' when wspace || cur_pos = 0 -> (true, cur_word, words)
| '\t' when wspace || cur_pos = 0 -> (true, cur_word, words)
| ' ' | '\t' -> (true, "", words # [(cur_word, cur_pos - 1)])
| ch -> (false, cur_word ^ String.make 1 ch, words))
(false, "", [])
|> (fun (_, _, collection) -> collection)
let parse_lines text =
String.split_on_char '\n' text
|> List.mapi
(fun i line ->
line
|> words_with_last_index
|> List.map (fun (word, pos) -> (word, i + 1, pos)))
|> List.flatten

F# Count how Many times a substring Contains within a string

How could one count how many times a substring exists within a string?
I mean if you have a String "one, two, three, one, one, two" how could you make it count "one" being present 3 times?
I thought String.Contains would be able to do the job but that only checks if the substring is present at all. String.forall is for chars and therefofre niether an option.
So i am really at a complete halt here. Can some enligten me?
You can use Regex.Escape to turn the string you're searching for into a regex, then use regex functions:
open System.Text.RegularExpressions
let countMatches wordToMatch (input : string) =
Regex.Matches(input, Regex.Escape wordToMatch).Count
Test:
countMatches "one" "one, two, three, one, one, two"
// Output: 3
Here's a simple implementation that walks through the string, using String.IndexOf to skip through to the next occurrence of the substring, and counts up how many times it succeeds.
let substringCount (needle : string) (haystack : string) =
let rec loop count (index : int) =
if index >= String.length haystack then count
else
match haystack.IndexOf(needle, index) with
| -1 -> count
| idx -> loop (count + 1) (idx + 1)
if String.length needle = 0 then 0 else loop 0 0
Bear in mind, this counts overlapping occurrences, e.g., subtringCount "aa" "aaaa" = 3. If you want non-overlapping, simply replace idx + 1 with idx + String.length needle.
Create a sequence of tails of the string to search in, that is, all substring slices anchored at its end. Then you can use forall functionality to determine the number of matches against the beginning of each of them. It's just golfier than (fun s -> s.StartsWith needle).
let count needle haystack =
[ for i in 0..String.length haystack - 1 -> haystack.[i..] ]
|> Seq.filter (Seq.forall2 (=) needle)
|> Seq.length
count "aba" "abacababac"
// val it : int = 3
a fellow student of mine came up with the so far simpelst solutions i have seen.
let countNeedle (haystack :string) (needle : string) =
match needle with
| "" -> 0
| _ -> (haystack.Length - haystack.Replace(needle, "").Length) / needle.Length
// This approach assumes the data is comma-delimited.
let data = "one, two, three, one, one, two"
let dataArray = data.Split([|','|]) |> Array.map (fun x -> x.Trim())
let countSubstrings searchTerm = dataArray |> Array.filter (fun x -> x = searchTerm) |> Array.length
let countOnes = countSubstrings "one"
let data' = "onetwothreeoneonetwoababa"
// This recursive approach makes no assumptions about a delimiter,
// and it will count overlapping occurrences (e.g., "aba" twice in "ababa").
// This is similar to Jake Lishman's answer.
let rec countSubstringFromI s i what =
let len = String.length what
if i + len - 1 >= String.length s then 0
else (if s.Substring(i, len) = what then 1 else 0) + countSubstringFromI s (i + 1) what
let countSubStrings' = countSubstringFromI data' 0 "one"

F# Converting Currency from Excel

I'm trying to convert a column in an Excel sheet to a float in my F# application. The problem is that I do not know in which format the currency is supplied. This can be manually typed, with or without a symbol and of course the . and , symbols are always a mess.
Is there any "short and sweet" way of warsing a what appears to be incohesive array of possibilities to an actual floating point value which later after some arithmetic can be printed as currency?
A side problem I've encountered:
When a column in Excel is marked as Number 600.00 will be exported through the interop libraries as 600; 534.20 will be exported as 534.2
A simple parse on the . symbol is not enough.
The symbol which is not shown is Excel will be exported through the interop libraries as a ? (with a space following).
These options do not wor:
let ParseFloat1 (o:obj) =
float (o.ToString())
let parseFloat2 (o:obj) =
float (System.Single.Parse(o.ToString()))
After these attempts I just went crazy and started russamafuzzin' solutions, not even this dragon of a bad idea worked:
let ParseFloat o =
// ugly
let mutable _string = o.ToString()
// because of the weird "lets leave trailing zero's off behavior
let changeString (s:string) =
match s.LastIndexOf "." with
| 0 | -1 -> s + "00"
| 1 -> s + "0"
| _ -> s
_string <- changeString _string
let characters = _string.ToCharArray()
// remove all the non numbers from the string
let rec parse source dest =
match source with
| h::t ->
match h with
| '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0' -> parse t (dest + h.ToString())
| _ -> parse t dest
| _ -> dest
let _float = parse (Array.toList characters) ""
let result = (float (System.Single.Parse(_float))) / (float 100)
result
I really hope someone can help me, because this is driving me crazy. Thank you in advance.
EDIT (16-11-2015):
More information after the valid comments, I appreciate all the help and comments.
I have broken the issue down into more "parts" so I've introduced a few conventions for this application. I figured that there is no solution for the problem so I needed to put in some restrictions and hope for the best...
I get the decimal symbol from CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator and make sure the column in the excel sheet is of type Number, adding a column of this type and resaving if needed.
I remove all the other symbols from the string, leaving only the separator in place. (Just like the answer Petr)
Running Excel in same context and app making sure the CultureInfo is the same.
To expand on Petr's answer:
let ParseFloat o =
let decimalSeparator = Convert.ToChar(CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator);
let newv = String((o.ToString()) |> Seq.filter (fun c -> Char.IsNumber c || c = decimalSeparator) |> Array.ofSeq )
let rslt = match Double.TryParse(newv) with
| true, number -> (float number)
| false, _ -> throw "Cannot parse the number"
rslt
Something like this?
open System
let value = "£1,097."
let newv = String(value |> Seq.filter (fun c -> Char.IsNumber c || c = '.') |> Array.ofSeq )
let rslt = match Double.TryParse(newv) with
| true, number -> printfn"Converted '%s' to %.2f" value number
| false, _ -> printfn "Unable to convert '%s'" value
Result:
Converted '£1,097.' to 1097.00

Optional capture of balanced brackets in Lua

Let's say I have lines of the form:
int[4] height
char c
char[50] userName
char[50+foo("bar")] userSchool
As you see, the bracketed expression is optional.
Can I parse these strings using Lua's string.match() ?
The following pattern works for lines that contain brackets:
line = "int[4] height"
print(line:match('^(%w+)(%b[])%s+(%w+)$'))
But is there a pattern that can handle also the optional brackets? The following does not work:
line = "char c"
print(line:match('^(%w+)(%b[]?)%s+(%w+)$'))
Can the pattern be written in another way to solve this?
Unlike regular expressions, ? in Lua pattern matches a single character.
You can use the or operator to do the job like this:
line:match('^(%w+)(%b[])%s+(%w+)$') or line:match('^(%w+)%s+(%w+)$')
A little problem with it is that Lua only keeps the first result in an expression. It depends on your needs, use an if statement or you can give the entire string the first capture like this
print(line:match('^((%w+)(%b[])%s+(%w+))$') or line:match('^((%w+)%s+(%w+))$'))
LPeg may be more appropriate for your case, especially if you plan to expand your grammar.
local re = require're'
local p = re.compile( [[
prog <- stmt* -> set
stmt <- S { type } S { name }
type <- name bexp ?
bexp <- '[' ([^][] / bexp)* ']'
name <- %w+
S <- %s*
]], {set = function(...)
local t, args = {}, {...}
for i=1, #args, 2 do t[args[i+1]] = args[i] end
return t
end})
local s = [[
int[4] height
char c
char[50] userName
char[50+foo("bar")] userSchool
]]
for k, v in pairs(p:match(s)) do print(k .. ' = ' .. v) end
--[[
c = char
userSchool = char[50+foo("bar")]
height = int[4]
userName = char[50]
--]]

Resources