Related
here is the code:
val a = "abcabca"
a.groupBy((c: Char) => a.count( (d:Char) => d == c))
here is the result I want:
scala.collection.immutable.Map[Int,String] = Map(2 -> b, 2 -> c, 3 -> a)
but the result I get is
scala.collection.immutable.Map[Int,String] = Map(2 -> bcbc, 3 -> aaa)
why?
thank you.
Write an expression like
"abcabca".groupBy(identity).collect{
case (k,v) => (k,v.length)
}
which will give output as
res0: scala.collection.immutable.Map[Char,Int] = Map(b -> 2, a -> 3, c -> 2)
Let's dissect your initial attempt :
a.groupBy((c: Char) => a.count( (d:Char) => d == c))
So, you're grouping by something which is what ? the result of a.count(...), so the key of your Map will be an Int. For the char a, we will get 3, for the chars b and c, we'll get 2.
Now, the original String will be traversed and for the results accumulated, char by char.
So after traversing the first "ab", the current state is "2-> b, 3->c". (Note that for each char in the string, the .count() is called, which is a n² wasteful algorithm, but anyway).
The string is progressively traversed, and at the end the accumulated results is shown. As it turns out, the 3 "a" have been sent under the "3" key, and the b and c have been sent to the key "2", in the order the string was traversed, which is the left to right order.
Now, a usual groupBy on a list returns something like Map[T, List[T]], so you may have expected a List[Char] somewhere. It doesn't happen (because the Repr for String is String), and your list of chars is effectively recombobulated into a String, and is given to you as such.
Hence your final result !
Your question header reads as "Scala count chars in a string logical error". But you are using Map and you wanted counts as keys. Equal keys are not allowed in Map objects. Hence equal keys get eliminated in the resulting Map, keeping just one, because no duplicate keys are allowed. What you want may be a Seq of tuples like (count, char) like List[Int,Char]. Try this.
val x = "abcabca"
x.groupBy(identity).mapValues(_.size).toList.map{case (x,y)=>(y,x)}
In Scal REPL:
scala> x.groupBy(identity).mapValues(_.size).toList.map{case (x,y)=>(y,x)}
res13: List[(Int, Char)] = List((2,b), (3,a), (2,c))
The above gives a list of counts and respective chars as a list of tuples.So this is what you may really wanted.
If you try converting this to a Map:
scala> x.groupBy(identity).mapValues(_.size).toList.map{case (x,y)=>(y,x)}.toMap
res14: scala.collection.immutable.Map[Int,Char] = Map(2 -> c, 3 -> a)
So this is not what you want obviously.
Even more concisely use:
x.distinct.map(v=>(x.filter(_==v).size,v))
scala> x.distinct.map(v=>(x.filter(_==v).size,v))
res19: scala.collection.immutable.IndexedSeq[(Int, Char)] = Vector((3,a), (2,b), (2,c))
The problem with your approach is you are mapping count to characters. Which is:
In case of
val str = abcabca
While traversing the string str a has count 3, b has count 2 and c has count 2 while creating the map (with the use of groupBy) it will put all the characters in the value which has the same key that is.
Map(3->aaa, 2->bc)
That’s the reason you are getting such output for your program.
As you can see in the definition of the groupBy function:
def
groupBy[K](f: (A) ⇒ K): immutable.Map[K, Repr]
Partitions this traversable collection into a map of traversable collections according to some discriminator function.
Note: this method is not re-implemented by views. This means when applied to a view it will always force the view and return a new traversable collection.
K
the type of keys returned by the discriminator function.
f
the discriminator function.
returns
A map from keys to traversable collections such that the following invariant holds:
(xs groupBy f)(k) = xs filter (x => f(x) == k)
That is, every key k is bound to a traversable collection of those elements x for which f(x) equals k.
GroupBy returns a Map which holds the following invariant.
(xs groupBy f)(k) = xs filter (x => f(x) == k)
Which means it return collection of elements for which the key is same.
In SML, how can i count the number of appearences of chars in a String using recursion?
Output should be in the form of (char,#AppearenceOfChar).
What i managed to do is
fun frequency(x) = if x = [] then [] else [(hd x,1)]#frequency(tl x)
which will return tupels of the form (char,1). I can too eliminate duplicates in this list, so what i fail to do now is to write a function like
fun count(s:string,l: (char,int) list)
which 'iterates' trough the string incrementing the particular tupel component. How can i do this recursively? Sorry for noob question but i am new to functional programming but i hope the question is at least understandable :)
I'd break the problem into two: Increasing the frequency of a single character, and iterating over the characters in a string and inserting each of them. Increasing the frequency depends on whether you have already seen the character before.
fun increaseFrequency (c, []) = [(c, 1)]
| increaseFrequency (c, ((c1, count)::freqs)) =
if c = c1
then (c1, count+1)
else (c1,count)::increaseFrequency (c, freqs)
This provides a function with the following type declaration:
val increaseFrequency = fn : ''a * (''a * int) list -> (''a * int) list
So given a character and a list of frequencies, it returns an updated list of frequencies where either the character has been inserted with frequency 1, or its existing frequency has been increased by 1, by performing a linear search through each tuple until either the right one is found or the end of the list is met. All other character frequencies are preserved.
The simplest way to iterate over the characters in a string is to explode it into a list of characters and insert each character into an accumulating list of frequencies that starts with the empty list:
fun frequencies s =
let fun freq [] freqs = freqs
| freq (c::cs) freqs = freq cs (increaseFrequency (c, freqs))
in freq (explode s) [] end
But this isn't a very efficient way to iterate a string one character at a time. Alternatively, you can visit each character by indexing without converting to a list:
fun foldrs f e s =
let val len = size s
fun loop i e' = if i = len
then e'
else loop (i+1) (f (String.sub (s, i), e'))
in loop 0 e end
fun frequencies s = foldrs increaseFrequency [] s
You might also consider using a more efficient representation of sets than lists to reduce the linear-time insertions.
all_nat x = [ls| sum ls == x]
I'd like to write a function that given an integer x it returns all the lists that the result of their elements when summed is the integer x but I always get the error "not in scope: 'ls' " for both times it apperas. I'm new to haskell. What's the syntax error here?
The problem is that you need to define all used variables somewhere, but ls is undefined. Moreover, it can't be defined automatically, because the compiler doesn't know about the task — how the list should be generated? Ho long can it be? Are terms positive or not, integral or not? Unfortunately your code definition of the problem is quite vague for modern non-AI languages.
Let's help the compiler. To solve such problems, it's often useful to involve some math and infer the algorithm inductively. For example, let's write an algorithm with ordered lists (where [2,1] and [1,2] are different solutions):
Start with a basis, where you know the output for some given input. For example, for 0 there is only an empty list of terms (if 0 could be a term, any number could be decomposed as a sum in infinitely many ways). So, let's define that:
allNats 0 = [[]] --One empty list
An inductive step. Assuming we can decompose a number n, we can decompose any number n+k for any positive k, by adding k as a term to all decompositions of n. In other words: for numbers greater than 0, we can take any number k from 1 to n, and make it the first term of all decompositions of (n-k):
allNats n = [ k:rest --Add k as a head to the rest, where
| k <- [1 .. n] --k is taken from 1 to n, and
, rest <- allNats (n - k)] --rest is taken from solutions for (n—k)
That's all! Let's test it:
ghci> allNat 4
[[1,1,1,1],[1,1,2],[1,2,1],[1,3],[2,1,1],[2,2],[3,1],[4]]
Let's break this up into two parts. If I've understood your question correctly, the first step is to generate all possible (sub)lists from a list. There's a function to do this, called subsequences.
The second step is to evaluate the sum of each subsequence, and keep the subsequences with the sum you want. So your list comprehension looks like this:
all_nat x = [ls| ls <- subsequences [1..x], sum ls == x]
What about
getAllSums x = [(l,r)| l <- partial_nat, r <- partial_nat, l + r == x ]
where partial_nat = [1..x]
Suppose we are given a string S, and a list of some other strings L.
How can we know if S is a one of all the possible concatenations of L?
For example:
S = "abcdabce"
L = ["abcd", "a", "bc", "e"]
S is "abcd" + "a" + "bc" + "e", then S is a concatenation of L, whereas "ababcecd" is not.
In order to solve this question, I tried to use DFS/backtracking. The pseudo code is as follows:
boolean isConcatenation(S, L) {
if (L.length == 1 && S == L[0]) return true;
for (String s: L) {
if (S.startwith(s)) {
markAsVisited(s);
if (isConcatnation(S.exclude(s), L.exclude(s)))
return true;
markAsUnvisited(s);
}
}
return false;
}
However, DFS/backtracking is not a efficient solution. I am curious what is the fastest algorithm to solve this question or if there is any other algorithm to solve it in a faster way. I hope there are algorithms like KMP, which can solve it in O(n) time.
In python:
>>> yes = 'abcdabce'
>>> no = 'ababcecd'
>>> L = ['abcd','a','bc','e']
>>> yes in [''.join(p) for p in itertools.permutations(L)]
True
>>> no in [''.join(p) for p in itertools.permutations(L)]
False
edit: as pointed out, this is n! complex, so is inappropriate for large L. But hey, development time under 10 seconds.
You can instead build your own permutation generator, starting with the basic permutator:
def all_perms(elements):
if len(elements) <=1:
yield elements
else:
for perm in all_perms(elements[1:]):
for i in range(len(elements)):
yield perm[:i] + elements[0:1] + perm[i:]
And then discard branches that you don't care about by tracking what the concatenation of the elements would be and only iterating if it adds up to your target string.
def all_perms(elements, conc=''):
...
for perm in all_perms(elements[1:], conc + elements[0]):
...
if target.startswith(''.join(conc)):
...
A dynamic programming approach would be to work left to right, building up an array A[x] where A[x] is true if the first x characters of the string form one of the possible concatenations of L. You can work out A[n] given earlier A[n] by checking each possible string in the list - if the characters of S up to the nth character match a candidate string of length k and if A[n-k] is true, then you can set A[n] true.
I note that you can use https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm to find the matches you need as input to the dynamic program - the matching costs will be linear in the size of the input string, the total size of all candidate strings, and the number of matches between the input string and candidate strings.
i would try the following:
find all positions of L_i patterns in S
let n=length(S)+1
create a graph with n nodes
for all L_i positions i: directed edges: node_i --> L_i matches node --> node_{i+length(L_i)}
to enable the permutation constrains you have to add some more node/edges to exclude multiple usage of the same pattern
now i can ask a new question: is there exists a directed path from 0 to n ?
notes:
if there exists a node(0 < i < n) with degree <2 then no match is possible
all nodes which have d-=1, d+=1 are part of the permutation
bread first or diskstra to look for the solution
You can use the Trie data structure. First, construct a trie from strings in L.
Then, for the input string S, search for the S in the trie.
During searching, for every visited node which is an end of one of the words in L, call a new search on the trie (from the root) with remaining (yet unmatched) suffix of S. So, we are using recursion. If you consume all characters of S in that process then you know, that S is a contatenation of some strings from L.
I would suggest this solution:
Take an array of size 256 which will store the occurence count of each character in all strings of L. Now try to match that with count of each character of S. If both are unequal then we can confidently say that they cannot form the given character.
If counts are same, Do the following, using KMP algorithm try to find simultaneously each string in L in S. If at any time there is a match we remove that string from L and continue search for other strings in L. If at any time we dont find a match we just print that it cannot be represented. If at the end L is empty we conclude that S indeed is a concatenation of L.
Assuming that L is a set of unique strings.
Two Haskell propositions:
There may be some counter examples to this...just for fun...sort L by a custom sort:
import Data.List (sortBy,isInfixOf)
h s l = (concat . sortBy wierd $ l) == s where
wierd a b | isInfixOf (a ++ b) s = LT
| isInfixOf (b ++ a) s = GT
| otherwise = EQ
More boring...attempt to build S from L:
import Data.List (delete,isPrefixOf)
f s l = g s l [] where
g str subs result
| concat result == s = [result]
| otherwise =
if null str || null subs'
then []
else do sub <- subs'
g (drop (length sub) str) (delete sub subs) (result ++ [sub])
where subs' = filter (flip isPrefixOf str) subs
Output:
*Main> f "abcdabce" ["abcd", "a", "bc", "e", "abc"]
[["abcd","a","bc","e"],["abcd","abc","e"]]
*Main> h "abcdabce" ["abcd", "a", "bc", "e", "abc"]
False
*Main> h "abcdabce" ["abcd", "a", "bc", "e"]
True
Your algorithm has complexity N^2 (N is the length of list). Let's see in actual C++
#include <string>
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
typedef pair<string::const_iterator, string::const_iterator> stringp;
typedef vector<string> strings;
bool isConcatenation(stringp S, const strings L) {
for (strings::const_iterator p = L.begin(); p != L.end(); ++p) {
auto M = mismatch(p->begin(), p->end(), S.first);
if (M.first == p->end()) {
if (L.size() == 1)
return true;
strings T;
T.insert(T.end(), L.begin(), p);
strings::const_iterator v = p;
T.insert(T.end(), ++v, L.end());
if (isConcatenation(make_pair(M.second, S.second), T))
return true;
}
}
return false;
}
Instead of looping on the entire vector, we could sort it, then reduce the search to O(LOG(N)) steps in the optimum case, where all strings start with different chars. The worst case will remain O(N^2).
I would like to insert an extra character (or a new string) at a specific location in a string. For example, I want to insert d at the fourth location in abcefg to get abcdefg.
Now I am using:
old <- "abcefg"
n <- 4
paste(substr(old, 1, n-1), "d", substr(old, n, nchar(old)), sep = "")
I could write a one-line simple function for this task, but I am just curious if there is an existing function for that.
You can do this with regular expressions and gsub.
gsub('^([a-z]{3})([a-z]+)$', '\\1d\\2', old)
# [1] "abcdefg"
If you want to do this dynamically, you can create the expressions using paste:
letter <- 'd'
lhs <- paste0('^([a-z]{', n-1, '})([a-z]+)$')
rhs <- paste0('\\1', letter, '\\2')
gsub(lhs, rhs, old)
# [1] "abcdefg"
as per DWin's comment,you may want this to be more general.
gsub('^(.{3})(.*)$', '\\1d\\2', old)
This way any three characters will match rather than only lower case. DWin also suggests using sub instead of gsub. This way you don't have to worry about the ^ as much since sub will only match the first instance. But I like to be explicit in regular expressions and only move to more general ones as I understand them and find a need for more generality.
as Greg Snow noted, you can use another form of regular expression that looks behind matches:
sub( '(?<=.{3})', 'd', old, perl=TRUE )
and could also build my dynamic gsub above using sprintf rather than paste0:
lhs <- sprintf('^([a-z]{%d})([a-z]+)$', n-1)
or for his sub regular expression:
lhs <- sprintf('(?<=.{%d})',n-1)
stringi package for the rescue once again! The most simple and elegant solution among presented ones.
stri_sub function allows you to extract parts of the string and substitute parts of it like this:
x <- "abcde"
stri_sub(x, 1, 3) # from first to third character
# [1] "abc"
stri_sub(x, 1, 3) <- 1 # substitute from first to third character
x
# [1] "1de"
But if you do this:
x <- "abcde"
stri_sub(x, 3, 2) # from 3 to 2 so... zero ?
# [1] ""
stri_sub(x, 3, 2) <- 1 # substitute from 3 to 2 ... hmm
x
# [1] "ab1cde"
then no characters are removed but new one are inserted. Isn't that cool? :)
#Justin's answer is the way I'd actually approach this because of its flexibility, but this could also be a fun approach.
You can treat the string as "fixed width format" and specify where you want to insert your character:
paste(read.fwf(textConnection(old),
c(4, nchar(old)), as.is = TRUE),
collapse = "d")
Particularly nice is the output when using sapply, since you get to see the original string as the "name".
newold <- c("some", "random", "words", "strung", "together")
sapply(newold, function(x) paste(read.fwf(textConnection(x),
c(4, nchar(x)), as.is = TRUE),
collapse = "-WEE-"))
# some random words strung together
# "some-WEE-NA" "rand-WEE-om" "word-WEE-s" "stru-WEE-ng" "toge-WEE-ther"
Your original way of doing this (i.e. splitting the string at an index and pasting in the inserted text) could be made into a generic function like so:
split_str_by_index <- function(target, index) {
index <- sort(index)
substr(rep(target, length(index) + 1),
start = c(1, index),
stop = c(index -1, nchar(target)))
}
#Taken from https://stat.ethz.ch/pipermail/r-help/2006-March/101023.html
interleave <- function(v1,v2)
{
ord1 <- 2*(1:length(v1))-1
ord2 <- 2*(1:length(v2))
c(v1,v2)[order(c(ord1,ord2))]
}
insert_str <- function(target, insert, index) {
insert <- insert[order(index)]
index <- sort(index)
paste(interleave(split_str_by_index(target, index), insert), collapse="")
}
Example usage:
> insert_str("1234567890", c("a", "b", "c"), c(5, 9, 3))
[1] "12c34a5678b90"
This allows you to insert a vector of characters at the locations given by a vector of indexes. The split_str_by_index and interleave functions are also useful on their own.
Edit:
I revised the code to allow for indexes in any order. Before, indexes needed to be in ascending order.
I've made a custom function called substr1 to deal with extracting, replacing and inserting chars in a string. Run these codes at the start of every session. Feel free to try it out and let me know if it needs to be improved.
# extraction
substr1 <- function(x,y) {
z <- sapply(strsplit(as.character(x),''),function(w) paste(na.omit(w[y]),collapse=''))
dim(z) <- dim(x)
return(z) }
# substitution + insertion
`substr1<-` <- function(x,y,value) {
names(y) <- c(value,rep('',length(y)-length(value)))
z <- sapply(strsplit(as.character(x),''),function(w) {
v <- seq(w)
names(v) <- w
paste(names(sort(c(y,v[setdiff(v,y)]))),collapse='') })
dim(z) <- dim(x)
return(z) }
# demonstration
abc <- 'abc'
substr1(abc,1)
# "a"
substr1(abc,c(1,3))
# "ac"
substr1(abc,-1)
# "bc"
substr1(abc,1) <- 'A'
# "Abc"
substr1(abc,1.5) <- 'A'
# "aAbc"
substr1(abc,c(0.5,2,3)) <- c('A','B')
# "AaB"
It took me some time to understand the regular expression, afterwards I found my way with the numbers I had
The end result was
old <- "89580000"
gsub('^([0-9]{5})([0-9]+)$', '\\1-\\2', old)
similar to yours!
First make sure to load tidyverse package, and then use both paste0 and gsub.
Here is the exact code:
paste0(substr(old, 1,3), "d", substr(old,4,6))
In base you can use regmatches to insert a character at a specific location in a string.
old <- "abcefg"
n <- 4
regmatches(old, `attr<-`(n, "match.length", 0)) <- "d"
old
#[1] "abcdefg"
This could also be used with a regex to find the location to insert.
s <- "abcefg"
regmatches(s, regexpr("(?<=c)", s, perl=TRUE)) <- "d"
s
#[1] "abcdefg"
And works also for multiple matches with individual repacements at different matches.
s <- "abcefg abcefg"
regmatches(s, gregexpr("(?<=c)", s, perl=TRUE)) <- list(1:2)
s
#[1] "abc1efg abc2efg"