F# - Error Loading Table With Empty Last Row Using HtmlProvider - f#-data

When loading a Wikipedia table using HtmlProvider, I get an error message because the last row in the table is empty!
module SOQN =
open System
open FSharp.Data
let [<Literal>] wikiUrl = #"https://en.wikipedia.org/wiki/COVID-19_testing#Virus_testing_statistics_by_country"
type Covid = HtmlProvider<wikiUrl>
let main() =
printfn ""
printfn "SOQN: Error Loading Table With Empty Last Row Using HtmlProvider?"
printfn ""
let feed = Covid.Load(wikiUrl)
feed.Tables.``Virus testing statistics by country``.Rows
|> Seq.map (fun r -> r.Date)
|> printf "%A "
printfn ""
0
[<EntryPoint>]
main() |> ignore
printfn "Fini!"
printfn ""
// Actual Output:
// "Date is missing"
//
// Expected Output:
// seq [ "Albania"; "19 Apr"; "5542"; "562"; "10.1"; "1,936"; "196"; "[121]" ]
// ...
//
What am I missing?
For instance, can I preset the column types to 'string' similar to using 'schema' with CsvProvider?

Related

How find integer in text

Help me figure out how to work with text
i have a string like: "word1 number: word2" for example : "result 0: Good" or "result 299: Bad"
i need print Undefined/Low or High
When string is null , print Undefined
When number 0-15, print Low
When number >15, print High
type GetResponse =
{
MyData: string voption
ErrorMessage: string voption }
val result: Result<GetResponse, MyError>
and then i try:
MyData =
match result with
| Ok value ->
if (value.Messages = null) then
ValueSome "result: Undefined"
else
let result =
value.Messages.FirstOrDefault(
(fun x -> x.ToUpperInvariant().Contains("result")),
"Undefined"
)
if (result <> "Undefined") then
ValueSome result
else
errors.Add("We don't have any result")
ValueNone
| Error err ->
errors.Add(err.ToErrorString)
ValueNone
ErrorMessage =
if errors.Any() then
(errors |> String.concat ", " |> ValueSome)
else
ValueNone
but i dont know gow check in string number and maybe there is some way print this without a billion if?
Parsing gets complex very quickly. I recommend using FParsec to simplify the logic and avoid errors. A basic parser that seems to meet your needs:
open System
open FParsec
let parseWord =
manySatisfy Char.IsLetter
let parseValue =
parseWord // parse any word (e.g. "result")
>>. spaces1 // skip whitespace
>>. puint32 // parse an unsigned integer value
.>> skipChar ':' // skip colon character
.>> spaces // skip whitespace
.>> parseWord // parse any word (e.g. "Good")
You can then use it like this:
type ParserResult = Undefined | Low | High
let parse str =
if isNull str then Result.Ok Undefined
else
match run parseValue str with
| Success (num, _ , _) ->
if num <= 15u then Result.Ok Low
else Result.Ok High
| Failure (errorMsg, _, _) ->
Result.Error errorMsg
parse null |> printfn "%A" // Ok Undefined
parse "result 0: Good" |> printfn "%A" // Ok Low
parse "result 299: Bad" |> printfn "%A" // Ok High
parse "invalid input" |> printfn "%A" // Error "Error in Ln: 1 Col: 9 ... Expecting: integer number"
There's definitely a learning curve with FParsec, but I think it's worth adding to your toolbelt.
I agree with Brian that parsing can become quite tricky very quickly. However if you have some well established format of the input and you're not very much into writing complex parsers, good old regular expressions can be of service ;)
Here is my take on the problem - please note that it has plenty of room to improve, this is just a proof of concept:
open System.Text.RegularExpressions
let test1 = "result 0: Good"
let test2 = "result 299: Bad"
let test3 = "some other text"
type ParserResult =
| Undefined
| Low of int
| High of int
let (|ValidNumber|_|) s =
//https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex?view=net-6.0
let rx = new Regex("(\w\s+)(\d+)\:(\s+\w)")
let matches = rx.Matches(s)
if matches.Count > 0 then
let groups = matches.[0].Groups |> Seq.toList
match groups with
| [_; _; a; _] -> Some (int a.Value)
| _ -> None
else
None
let parseMyString str =
match str with
| ValidNumber n when n < 16 -> Low n
| ValidNumber n -> High n
| _ -> Undefined
//let r = parseMyString test1
printfn "%A" (parseMyString test1)
printfn "%A" (parseMyString test2)
printfn "%A" (parseMyString test3)
The active pattern ValidNumber returns the Some number if a match of the input string is found, otherwise it returns None. The parseMyString function uses the pattern and guards to initialise the final ParserOutput value.

js_of_ocaml calling a function in ocaml from js

I have a function that uses a mutable variable that takes strings and returns strings. (its a read eval print loop interpreter)
I tried exporting it as such:
let () =
Js.export_all
(object%js
method js_run_repl = Js.wrap_callback js_run_repl
end)
Heres a snippet of the function im exporting
let js_run_repl str =
match String.(compare str "quit") with
| 0 -> "bye"
| _ -> ...
regardless of my input it always returns bye, calling the function directly in ocaml produced the expected behaviour. Heres the output from node:
> var mod = require('./main.bc');
undefined
> mod.js_run("constant P : Prop");
MlBytes { t: 0, c: 'bye', l: 3 }
>
Its also peculiar why the function is called js_run instead of js_run_repl. the latter is undefined according to node.
let () =
Js.export_all
(object%js
method js_run_repl str =
str
|> Js.to_string
|> js_run_repl
|> Js.string
end)
I had to convert the strings explicitly to ocaml strings and back to js

reading integers from a string

I want to read a line from a file, initialize an array from that line and then display the integers.
Why is is not reading the five integers in the line? I want to get output 1 2 3 4 5, i have 1 1 1 1 1
open Array;;
open Scanf;;
let print_ints file_name =
let file = open_in file_name in
let s = input_line(file) in
let n = ref 5 in
let arr = Array.init !n (fun i -> if i < !n then sscanf s "%d" (fun a -> a) else 0) in
let i = ref 0 in
while !i < !n do
print_int (Array.get arr !i);
print_string " ";
i := !i + 1;
done;;
print_ints "string_ints.txt";;
My file is just: 1 2 3 4 5
You might want to try the following approach. Split your string into a list of substrings representing numbers. This answer describes one way of doing so. Then use the resulting function in your print_ints function.
let ints_of_string s =
List.map int_of_string (Str.split (Str.regexp " +") s)
let print_ints file_name =
let file = open_in file_name in
let s = input_line file in
let ints = ints_of_string s in
List.iter (fun i -> print_int i; print_char ' ') ints;
close_in file
let _ = print_ints "string_ints.txt"
When compiling, pass str.cma or str.cmxa as an argument (see this answer for details on compilation):
$ ocamlc str.cma print_ints.ml
Another alternative would be using the Scanf.bscanf function -- this question, contains an example (use with caution).
The Scanf.sscanf function may not be particularly suitable for this task.
An excerpt from the OCaml manual:
the scanf facility is not intended for heavy duty lexical analysis and parsing. If it appears not expressive enough for your needs, several alternative exists: regular expressions (module Str), stream parsers, ocamllex-generated lexers, ocamlyacc-generated parsers
There is though a way to parse a string of ints using Scanf.sscanf (which I wouldn't recommend):
let rec int_list_of_string s =
try
Scanf.sscanf s
"%d %[0-9-+ ]"
(fun n rest_str -> n :: int_list_of_string rest_str)
with
| End_of_file | Scanf.Scan_failure _ -> []
The trick here is to represent the input string s as a part which is going to be parsed into a an integer (%d) and the rest of the string using the range format: %[0-9-+ ]", which will match the rest of the string, containing only decimal digits 0-9, the - and + signs, and whitespace .

Joining on the first finished thread?

I'm writing up a series of graph-searching algorithms in F# and thought it would be nice to take advantage of parallelization. I wanted to execute several threads in parallel and take the result of the first one to finish. I've got an implementation, but it's not pretty.
Two questions: is there a standard name for this sort of function? Not a Join or a JoinAll, but a JoinFirst? Second, is there a more idiomatic way to do this?
//implementation
let makeAsync (locker:obj) (shared:'a option ref) (f:unit->'a) =
async {
let result = f()
Monitor.Enter locker
shared := Some result
Monitor.Pulse locker
Monitor.Exit locker
}
let firstFinished test work =
let result = ref Option.None
let locker = new obj()
let cancel = new CancellationTokenSource()
work |> List.map (makeAsync locker result) |> List.map (fun a-> Async.StartAsTask(a, TaskCreationOptions.None, cancel.Token)) |> ignore
Monitor.Enter locker
while (result.Value.IsNone || (not <| test result.Value.Value)) do
Monitor.Wait locker |> ignore
Monitor.Exit locker
cancel.Cancel()
match result.Value with
| Some x-> x
| None -> failwith "Don't pass in an empty list"
//end implentation
//testing
let delayReturn (ms:int) value =
fun ()->
Thread.Sleep ms
value
let test () =
let work = [ delayReturn 1000 "First!"; delayReturn 5000 "Second!" ]
let result = firstFinished (fun _->true) work
printfn "%s" result
Would it work to pass the CancellationTokenSource and test to each async and have the first that computes a valid result cancel the others?
let makeAsync (cancel:CancellationTokenSource) test f =
let rec loop() =
async {
if cancel.IsCancellationRequested then
return None
else
let result = f()
if test result then
cancel.Cancel()
return Some result
else return! loop()
}
loop()
let firstFinished test work =
match work with
| [] -> invalidArg "work" "Don't pass in an empty list"
| _ ->
let cancel = new CancellationTokenSource()
work
|> Seq.map (makeAsync cancel test)
|> Seq.toArray
|> Async.Parallel
|> Async.RunSynchronously
|> Array.pick id
This approach makes several improvements: 1) it uses only async (it's not mixed with Task, which is an alternative for doing the same thing--async is more idiomatic in F#); 2) there's no shared state, other than CancellationTokenSource, which was designed for that purpose; 3) the clean function-chaining approach makes it easy to add additional logic/transformations to the pipeline, including trivially enabling/disabling parallelism.
With the Task Parallel Library in .NET 4, this is called WaitAny. For example, the following snippet creates 10 tasks and waits for any of them to complete:
open System.Threading
Array.init 10 (fun _ ->
Tasks.Task.Factory.StartNew(fun () ->
Thread.Sleep 1000))
|> Tasks.Task.WaitAny
In case you are ok to use "Reactive extensions (Rx)" in your project, the joinFirst method can be implemented as:
let joinFirst (f : (unit->'a) list) =
let c = new CancellationTokenSource()
let o = f |> List.map (fun i ->
let j = fun() -> Async.RunSynchronously (async {return i() },-1,c.Token)
Observable.Defer(fun() -> Observable.Start(j))
)
|> Observable.Amb
let r = o.First()
c.Cancel()
r
Example usage:
[20..30] |> List.map (fun i -> fun() -> Thread.Sleep(i*100); printfn "%d" i; i)
|> joinFirst |> printfn "Done %A"
Console.Read() |> ignore
Update:
Using Mailbox processor :
type WorkMessage<'a> =
Done of 'a
| GetFirstDone of AsyncReplyChannel<'a>
let joinFirst (f : (unit->'a) list) =
let c = new CancellationTokenSource()
let m = MailboxProcessor<WorkMessage<'a>>.Start(
fun mbox -> async {
let afterDone a m =
match m with
| GetFirstDone rc ->
rc.Reply(a);
Some(async {return ()})
| _ -> None
let getDone m =
match m with
|Done a ->
c.Cancel()
Some (async {
do! mbox.Scan(afterDone a)
})
|_ -> None
do! mbox.Scan(getDone)
return ()
} )
f
|> List.iter(fun t -> try
Async.RunSynchronously (async {let out = t()
m.Post(Done out)
return ()},-1,c.Token)
with
_ -> ())
m.PostAndReply(fun rc -> GetFirstDone rc)
Unfortunately, there is no built-in operation for this provided by Async, but I'd still use F# asyncs, because they directly support cancellation. When you start a workflow using Async.Start, you can pass it a cancellation token and the workflow will automatically stop if the token is cancelled.
This means that you have to start workflows explicitly (instead of using Async.Parallel), so the synchronizataion must be written by hand. Here is a simple version of Async.Choice method that does that (at the moment, it doesn't handle exceptions):
open System.Threading
type Microsoft.FSharp.Control.Async with
/// Takes several asynchronous workflows and returns
/// the result of the first workflow that successfuly completes
static member Choice(workflows) =
Async.FromContinuations(fun (cont, _, _) ->
let cts = new CancellationTokenSource()
let completed = ref false
let lockObj = new obj()
let synchronized f = lock lockObj f
/// Called when a result is available - the function uses locks
/// to make sure that it calls the continuation only once
let completeOnce res =
let run =
synchronized(fun () ->
if completed.Value then false
else completed := true; true)
if run then cont res
/// Workflow that will be started for each argument - run the
/// operation, cancel pending workflows and then return result
let runWorkflow workflow = async {
let! res = workflow
cts.Cancel()
completeOnce res }
// Start all workflows using cancellation token
for work in workflows do
Async.Start(runWorkflow work, cts.Token) )
Once we write this operation (which is a bit complex, but has to be written only once), solving the problem is quite easy. You can write your operations as async workflows and they'll be cancelled automatically when the first one completes:
let delayReturn n s = async {
do! Async.Sleep(n)
printfn "returning %s" s
return s }
Async.Choice [ delayReturn 1000 "First!"; delayReturn 5000 "Second!" ]
|> Async.RunSynchronously
When you run this, it will print only "returning First!" because the second workflow will be cancelled.

F#: I cannot return unit in a do clause and still have side effects

I'm writing a simple ini file parser and I'm having a little problem with the initialization of the object in the "do" clause. It wants me to return a unit but i can't get the blankity function to do the side effects if I try to pipe into an "ignore" or if i return "()" directly.
This code works as a separate function because I can ignore the results.
#light
module Utilities.Config
open System
open System.IO
open System.Text.RegularExpressions
open System.Collections.Generic
type Config(?fileName : string) =
let fileName = defaultArg fileName #"C:\path\myConfigs.ini"
static let defaultSettings =
dict[ "Setting1", "1";
"Setting2", "2";
"Debug", "0";
"State", "Disarray";]
let settingRegex = new Regex(#"\s*(?<key>([^;#=]*[^;#= ]))\s*=\s*(?<value>([^;#]*[^;# ]))")
let fileSettings = new Dictionary<string, string>()
let addFileSetting (groups : GroupCollection) =
fileSettings.Add(groups.Item("key").Value, groups.Item("value").Value)
do File.ReadAllLines(fileName)
|> Seq.map(fun line -> settingRegex.Match(line))
|> Seq.filter(fun mtch -> mtch.Success)
|> Seq.map(fun mtch -> addFileSetting(mtch.Groups) // Does not have the correct return type
//|> ignore //#1 Does not init the dictionary
//() //#2 Does not init the dictionary
//The extra step will work
member c.ReadFile =
File.ReadAllLines(fileName)
|> Seq.map(fun line -> settingRegex.Match(line))
|> Seq.filter(fun mtch -> mtch.Success)
|> Seq.map(fun mtch -> addFileSetting(mtch.Groups))
Use Seq.iter (executing an action for each element - returning unit) instead of Seq.map (transforming elements).
The code doesn't work with ignore because Seq's are evaluated lazily and when you ignore the result, there is no need to run any code at all. Read this article

Resources