F# Count how Many times a substring Contains within a string - string

How could one count how many times a substring exists within a string?
I mean if you have a String "one, two, three, one, one, two" how could you make it count "one" being present 3 times?
I thought String.Contains would be able to do the job but that only checks if the substring is present at all. String.forall is for chars and therefofre niether an option.
So i am really at a complete halt here. Can some enligten me?

You can use Regex.Escape to turn the string you're searching for into a regex, then use regex functions:
open System.Text.RegularExpressions
let countMatches wordToMatch (input : string) =
Regex.Matches(input, Regex.Escape wordToMatch).Count
Test:
countMatches "one" "one, two, three, one, one, two"
// Output: 3

Here's a simple implementation that walks through the string, using String.IndexOf to skip through to the next occurrence of the substring, and counts up how many times it succeeds.
let substringCount (needle : string) (haystack : string) =
let rec loop count (index : int) =
if index >= String.length haystack then count
else
match haystack.IndexOf(needle, index) with
| -1 -> count
| idx -> loop (count + 1) (idx + 1)
if String.length needle = 0 then 0 else loop 0 0
Bear in mind, this counts overlapping occurrences, e.g., subtringCount "aa" "aaaa" = 3. If you want non-overlapping, simply replace idx + 1 with idx + String.length needle.

Create a sequence of tails of the string to search in, that is, all substring slices anchored at its end. Then you can use forall functionality to determine the number of matches against the beginning of each of them. It's just golfier than (fun s -> s.StartsWith needle).
let count needle haystack =
[ for i in 0..String.length haystack - 1 -> haystack.[i..] ]
|> Seq.filter (Seq.forall2 (=) needle)
|> Seq.length
count "aba" "abacababac"
// val it : int = 3

a fellow student of mine came up with the so far simpelst solutions i have seen.
let countNeedle (haystack :string) (needle : string) =
match needle with
| "" -> 0
| _ -> (haystack.Length - haystack.Replace(needle, "").Length) / needle.Length

// This approach assumes the data is comma-delimited.
let data = "one, two, three, one, one, two"
let dataArray = data.Split([|','|]) |> Array.map (fun x -> x.Trim())
let countSubstrings searchTerm = dataArray |> Array.filter (fun x -> x = searchTerm) |> Array.length
let countOnes = countSubstrings "one"
let data' = "onetwothreeoneonetwoababa"
// This recursive approach makes no assumptions about a delimiter,
// and it will count overlapping occurrences (e.g., "aba" twice in "ababa").
// This is similar to Jake Lishman's answer.
let rec countSubstringFromI s i what =
let len = String.length what
if i + len - 1 >= String.length s then 0
else (if s.Substring(i, len) = what then 1 else 0) + countSubstringFromI s (i + 1) what
let countSubStrings' = countSubstringFromI data' 0 "one"

Related

How convert first char to lowerCase

Try to play with string and I have string like: "Hello.Word" or "stackOver.Flow"
and i what first char convert to lower case: "hello.word" and "stackOver.flow"
For snakeCase it easy we need only change UpperCase to lower and add '_'
but in camelCase (with firs char in lower case) i dont know how to do this
open System
let convertToSnakeCase (value:string) =
String [|
Char.ToLower value.[0]
for ch in value.[1..] do
if Char.IsUpper ch then '_'
Char.ToLower ch |]
Who can help?
module Identifier =
open System
let changeCase (str : string) =
if String.IsNullOrEmpty(str) then str
else
let isUpper = Char.IsUpper
let n = str.Length
let builder = new System.Text.StringBuilder()
let append (s:string) = builder.Append(s) |> ignore
let rec loop i j =
let k =
if i = n (isUpper str.[i] && (not (isUpper str.[i - 1])
((i + 1) <> n && not (isUpper str.[i + 1]))))
then
if j = 0 then
append (str.Substring(j, i - j).ToLower())
elif (i - j) > 2 then
append (str.Substring(j, 1))
append (str.Substring(j + 1, i - j - 1).ToLower())
else
append (str.Substring(j, i - j))
i
else
j
if i = n then builder.ToString()
else loop (i + 1) k
loop 1 0
type System.String with
member x.ToCamelCase() = changeCase x
printfn "%s" ("StackOver.Flow".ToCamelCase()) //stackOver.Flow
//need stackOver.flow
I suspect there are much more elegant and concise solutions, I sense you are learning functional programming, so I think its best to do stuff like this with recursive function rather than use some magic library function. I notice in your question you ARE using a recusive function, but also an index into an array, lists and recursive function work much more easily than arrays, so if you use recursion the solution is usually simpler if its a list.
I'd also avoid using a string builder, assuming you are learning fp, string builders are imperative, and whilst they obviously work, they wont help you get your head around using immutable data.
The key then is to use the pattern match to match the scenario that you want to use to trigger the upper/lower case logic, as it depends on 2 consecutive characters.
I THINK you want this to happen for the 1st char, and after a '.'?
(I've inserted a '.' as the 1st char to allow the recursive function to just process the '.' scenario, rather than making a special case).
let convertToCamelCase (value : string) =
let rec convertListToCamelCase (value : char list) =
match value with
| [] -> []
| '.' :: second :: rest ->
'.' :: convertListToCamelCase (Char.ToLower second :: rest)
| c :: rest ->
c :: convertListToCamelCase rest
// put a '.' on the front to simplify the logic (and take it off after)
let convertAsList = convertListToCamelCase ('.' :: (value.ToCharArray() |> Array.toList))
String ((convertAsList |> List.toArray).[1..])
The piece to worry about is the recusive piece, the rest of it is just flipping an array to a list and back again.

How to split a string into sub strings of n length?

How would i split a string into sub array's of n length in Matlab?
eg.
Input: "ABCDEFGHIJKL", with sub arrays of length 3
Output: {ABC}, {DEF}, {GHI}, {JKL}
If the string length is not a multiple of n you probably need a loop or arrayfun:
x = 'ABCDEFGHIJK'; % length 11
n = 3;
result = arrayfun(#(k) x(k:min(k+n-1, end)), 1:n:numel(x), 'UniformOutput', false)
Alternatively, accumarray can be used as well:
x = 'ABCDEFGHIJK';
n = 3;
result = accumarray(floor((0:numel(x)-1).'/n)+1, x, [], #(t) {t.'}).';
Either of the above gives, in this example,
result =
1×4 cell array
{'ABC'} {'DEF'} {'GHI'} {'JK'}
A regular expression can do the job here:
str = 'abcdefgh'
exp = '.{1,3}' %the regular expression (get all the group of 3 char, if number of char left < 3, take the rest)
res = regexp(str,exp,'match')
which give:
res =
1×3 cell array
{'abc'} {'def'} {'gh'}
If you only want to match group of 3 char:
exp = '.{3}' %this will output {'abc'} {'def'} but no {'gh'}
This shoud do it :)
string = cellstr(reshape(string, 3, [])')

How to remove the firsts n characters from a string in Elixir?

I have a list of strings. Each of those strings starts with n characters I want to get rid of.
I can't use something like "123" <> new_string = old_string because the characters can be anything.
So I'd like to do something like this:
my_list |> Enum.map(fn(str) ->
# Remove the n leading characters from str
end)
Do you know how I could achieve this?
You can use String.slice/2 to remove the first N graphemes of a string, and binary_part/3 or pattern matching to remove the first N bytes of a string.
Setup:
iex(1)> a = "abc"
"abc"
iex(2)> b = "πr²"
"πr²"
Removing the first 2 graphemes of a string:
iex(3)> String.slice(a, 2..-1)
"c"
iex(4)> String.slice(b, 2..-1)
"²"
Removing the first 2 bytes of a string:
iex(5)> binary_part(a, 2, byte_size(a) - 2)
"c"
iex(6)> binary_part(b, 2, byte_size(b) - 2)
"r²"
iex(7)> remove = 2
2
iex(8)> <<_::binary-size(remove), rest::binary>> = a; rest
"c"
iex(9)> <<_::binary-size(remove), rest::binary>> = b; rest
"r²"
Another alternative:
defmodule StringExtensions do
def remove_first_n_chars(s, n) do
{_, new_string} = s |> String.codepoints() |> Enum.split(n)
new_string |> Enum.join()
end
end
Which would then be used like so:
l = ["abcdefg","hijklmno","pqrstuv"]
l2 = l |> Enum.map(fn str -> StringExtensions.remove_first_n_chars(str,2) end) # l2 -> ["cdefg", "jklmno", "rstuv"]
Just wanted to offer a potential alternative, FWIW.

String lexicographical permutation and inversion

Consider the following function on a string:
int F(string S)
{
int N = S.size();
int T = 0;
for (int i = 0; i < N; i++)
for (int j = i + 1; j < N; j++)
if (S[i] > S[j])
T++;
return T;
}
A string S0 of length N with all pairwise distinct characters has a total of N! unique permutations.
For example "bac" has the following 6 permutations:
bac
abc
cba
bca
acb
cab
Consider these N! strings in lexicographical order:
abc
acb
bac
bca
cab
cba
Now consider the application of F to each of these strings:
F("abc") = 0
F("acb") = 1
F("bac") = 1
F("bca") = 2
F("cab") = 2
F("cba") = 3
Given some string S1 of this set of permutations, we want to find the next string S2 in the set, that has the following relationship to S1:
F(S2) == F(S1) + 1
For example if S1 == "acb" (F = 1) than S2 == "bca" (F = 1 + 1 = 2)
One way to do this would be to start at one past S1 and iterate through the list of permutations looking for F(S) = F(S1)+1. This is unfortunately O(N!).
By what O(N) function on S1 can we calculate S2 directly?
Suppose length of S1 is n, biggest value for F(S1) is n(n-1)/2, if F(S1) = n(n-1)/2, means it's a last function and there isn't any next for it, but if F(S1) < n(n-1)/2, means there is at least one char x which is bigger than char y and x is next to y, find such a x with lowest index, and change x and y places. let see it by example:
S1 == "acb" (F = 1) , 1 < 3 so there is a char x which is bigger than another char y and its index is bigger than y, here smallest index x is c, and by first try you will replace it with a (which is smaller than x so algorithm finishes here)==> S2= "cab", F(S2) = 2.
Now let test it with S2, cab: x=b, y=a, ==> S3 = "cba".\
finding x is not hard, iterate the input, and have a variable name it min, while current visited character is smaller than min, set min as newly visited char, and visit next character, first time you visit a character which is bigger than min stop iteration, this is x:
This is pseudocode in c# (but I wasn't careful about boundaries e.g in input.Substring):
string NextString(string input)
{
var min = input[0];
int i=1;
while (i < input.Length && input[i] < min)
{
min = input[i];
i++;
}
if (i == input.Length) return "There isn't next item";
var x = input[i], y=input[i-1];
return input.Substring(0,i-2) + x + y + input.Substring(i,input.Length - 1 - i);
}
Here's the outline of an algorithm for a solution to your problem.
I'll assume that you have a function to directly return the n-th permutation (given n) and its inverse, ie a function to return n given a permutation. Let these be perm(n) and perm'(n) respectively.
If I've figured it correctly, when you have a 4-letter string to permute the function F goes like this:
F("abcd") = 0
F("abdc") = 1
F(perm(3)) = 1
F(...) = 2
F(...) = 2
F(...) = 3
F(perm(7)) = 1
F(...) = 2
F(...) = 2
F(...) = 3
F(...) = 3
F(...) = 4
F(perm(13)) = 2
F(...) = 3
F(...) = 3
F(...) = 4
F(...) = 4
F(...) = 5
F(perm(19)) = 3
F(...) = 4
F(...) = 4
F(...) = 5
F(...) = 5
F(perm(24)) = 6
In words, when you go from 3 letters to 4 you get 4 copies of the table of values of F, adding (0,1,2,3) to the (1st,2nd,3rd,4th) copy respectively. In the 2nd case, for example, you already have one derangement by putting the 2nd letter in the 1st place; this simply gets added to the other derangements in the same pattern as would be true for the original 3-letter strings.
From this outline it shouldn't be too difficult (but I haven't got time right now) to write the function F. Strictly speaking the inverse of F isn't a function as it would be multi-valued, but given n, and F(n) there are only a few cases for finding m st F(m)==F(n)+1. These cases are:
n == N! where N is the number of letters in the string, there is no next permutation;
F(n+1) < F(n), the sought-for solution is perm(n+(N-1)!), ;
F(n+1) == F(n), the solution is perm(n+2);
F(n+1) > F(n), the solution is perm(n+1).
I suspect that some of this might only work for 4 letter strings, that some of these terms will have to be adjusted for K-letter permutations.
This is not O(n), but it is at least O(n²) (where n is the number of elements in the permutation, in your example 3).
First, notice that whenever you place a character in your string, you already know how much of an increase in F that's going to mean -- it's however many characters smaller than that one that haven't been added to the string yet.
This gives us another algorithm to calculate F(n):
used = set()
def get_inversions(S1):
inv = 0
for index, ch in enumerate(S1):
character = ord(ch)-ord('a')
cnt = sum(1 for x in range(character) if x not in used)
inv += cnt
used.add(character)
return inv
This is not much better than the original version, but it is useful when inverting F. You want to know the first string that is lexicographically smaller -- therefore, it makes sense to copy your original string and only change it whenever mandatory. When such changes are required, we should also change the string by the least amount possible.
To do so, let's use the information that the biggest value of F for a string with n letters is n(n-1)/2. Whenever the number of required inversions would be bigger than this amount if we didn't change the original string, this means we must swap a letter at that point. Code in Python:
used = set()
def get_inversions(S1):
inv = 0
for index, ch in enumerate(S1):
character = ord(ch)-ord('a')
cnt = sum(1 for x in range(character) if x not in used)
inv += cnt
used.add(character)
return inv
def f_recursive(n, S1, inv, ign):
if n == 0: return ""
delta = inv - (n-1)*(n-2)/2
if ign:
cnt = 0
ch = 0
else:
ch = ord(S1[len(S1)-n])-ord('a')
cnt = sum(1 for x in range(ch) if x not in used)
for letter in range(ch, len(S1)):
if letter not in used:
if cnt < delta:
cnt += 1
continue
used.add(letter)
if letter != ch: ign = True
return chr(letter+ord('a'))+f_recursive(n-1, S1, inv-cnt, ign)
def F_inv(S1):
used.clear()
inv = get_inversions(S1)
used.clear()
return f_recursive(len(S1), S1, inv+1, False)
print F_inv("acb")
It can also be made to run in O(n log n) by replacing the innermost loop with a data structure such as a binary indexed tree.
Did you try to swap two neighbor characters in the string? It seems that it can help to solve the problem. If you swap S[i] and S[j], where i < j and S[i] < S[j], then F(S) increases by one, because all other pairs of indices are not affected by this permutation.
If I'm not mistaken, F calculates the number of inversions of the permutation.

How can I extract the middle part of a string in FSharp?

I want to extract the middle part of a string using FSharp if it is quoted, similar like this:
let middle =
match original with
| "\"" + mid + "\"" -> mid
| all -> all
But it doesn't work because of the infix operator + in pattern expression. How can I extract this?
I don't think there is any direct support for this, but you can certainly write an active pattern. Active patterns allow you to implement your own code that will run as part of the pattern matching and you can extract & return some part of the value.
The following is a pattern that takes two parameters (prefix and postfix string) and succeeds if the given input starts/ends with the specified strings. The pattern is not complete (can fail), so we'll use the |Name|_| syntax and it will need to return option value:
let (|Middle|_|) prefix postfix (input:string) =
// Check if the string starts with 'prefix', ends with 'postfix' and
// is longer than the two (meaning that it contains some middle part)
if input.StartsWith(prefix) && input.EndsWith(postfix) &&
input.Length >= (prefix.Length + postfix.Length) then
// Strip the prefix/postfix and return 'Some' to indicate success
let len = input.Length - prefix.Length - postfix.Length
Some(input.Substring(prefix.Length, len))
else None // Return 'None' - string doesn't match the pattern
Now we can use Middle in pattern matching (e.g. when using match):
match "[aaa]" with
| Middle "[" "]" mid -> mid
| all -> all
Parameterized active patterns to the rescue!
let (|HasPrefixSuffix|_|) (pre:string, suf:string) (s:string) =
if s.StartsWith(pre) then
let rest = s.Substring(pre.Length)
if rest.EndsWith(suf) then
Some(rest.Substring(0, rest.Length - suf.Length))
else
None
else
None
let Test s =
match s with
| HasPrefixSuffix("\"","\"") inside ->
printfn "quoted, inside is: %s" inside
| _ -> printfn "not quoted: %s" s
Test "\"Wow!\""
Test "boring"
… or just use plain old regular expression
let Middle input =
let capture = Regex.Match(input, "\"([^\"]+)\"")
match capture.Groups.Count with
| 2 -> capture.Groups.[1].Value
| _ -> input
Patterns have a limited grammar - you can't just use any expression. In this case, I'd just use an if/then/else:
let middle (s:string) =
if s.[0] = '"' && s.[s.Length - 1] = '"' && s.Length >= 2 then
s.Substring(1,s.Length - 2)
else s
If grabbing the middle of a string with statically known beginnings and endings is something that you'll do a lot, then you can always use an active pattern as Tomas suggests.
Not sure how efficient this is:
let GetQuote (s:String) (q:char) =
s
|> Seq.skip ((s |> Seq.findIndex (fun c -> c = q))+1)
|> Seq.takeWhile (fun c-> c <> q)
|> Seq.fold(fun acc c -> String.Format("{0}{1}", acc, c)) ""
Or there's this with Substring in place of the fold:
let GetQuote2 (s:String) (q:char) =
let isQuote = (fun c -> c = q)
let a = (s |> Seq.findIndex isQuote)+1
let b = ((s |> Seq.take(a) |> Seq.findIndex isQuote)-1)
s.Substring(a,b);
These will get the first instance of the quoted text anywhere in the string e.g "Hello [World]" -> "World"

Resources