How to count two characters in a string? - string

There are three parameters (String s, char c, char d)
How do I define a method so that it returns as an int, the number of times the char c occurs in the String s added to the number of times the char d occurs in the String s?

Depending on the language, there could already be an existing function that does this for you.
Otherwise, you'd need to treat the string like an array (if the language does that, then great. If it doesn't, then you need to cast it into an array), run a loop and use a counter. If you want to look for specific characters, you can pass that in a function along with the string/array).
Example (this is just psudo-code, since I don't know what language you're using):
function countCharsInString(string s, char c1, char c2)
{
int count = 0;
for(i = 0; i < s.Length; i++)
{
if(s[i] == c1 || s[i] == c2)
{
count++;
}
}
return count;
}

Related

Longest Common Substring non-DP solution with O(m*n)

The definition of the problem is:
Given two strings, find the longest common substring.
Return the length of it.
I was solving this problem and I think I solved it with O(m*n) time complexity. However I don't know why when I look up the solution, it's all talking about the optimal solution being dynamic programming - http://www.geeksforgeeks.org/longest-common-substring/
Here's my solution, you can test it here: http://www.lintcode.com/en/problem/longest-common-substring/
int longestCommonSubstring(string &A, string &B) {
int ans = 0;
for (int i=0; i<A.length(); i++) {
int counter = 0;
int k = i;
for (int j=0; j<B.length() && k <A.length(); j++) {
if (A[k]!=B[j]) {
counter = 0;
k = i;
} else {
k++;
counter++;
ans = max(ans, counter);
}
}
}
return ans;
}
My idea is simple, start from the first position of string A and see what's the longest substring I can match with string B, then start from the second position of string A and see what's the longest substring I can match....
Is there something wrong with my solution? Or is it not O(m*n) complexity?
Good news: your algorithm is O(mn). Bad news: it doesn't work correctly.
Your inner loop is wrong: it's intended to find the longest initial substring of A[i:] in B, but it works like this:
j = 0
While j < len(B)
Match as much of A[i:] against B[j:]. Call it s.
Remember s if it's the longest so far found.
j += len(s)
This fails to find the longest match. For example, when A = "XXY" and B = "XXXY" and i=0 it'll find "XX" as the longest match instead of the complete match "XXY".
Here's a runnable version of your code (lightly transcribed into C) that shows the faulty result:
#include <string.h>
#include <stdio.h>
int lcs(const char* A, const char* B) {
int al = strlen(A);
int bl = strlen(B);
int ans = 0;
for (int i=0; i<al; i++) {
int counter = 0;
int k = i;
for (int j=0; j<bl && k<al; j++) {
if (A[k]!=B[j]) {
counter = 0;
k = i;
} else {
k++;
counter++;
if (counter >= ans) ans = counter;
}
}
}
return ans;
}
int main(int argc, char**argv) {
printf("%d\n", lcs("XXY", "XXXY"));
return 0;
}
Running this program outputs "2".
Your solution is O(nm) complexity and if you look compare the structure to the provided algorithm its the exact same; however, yours does not memoize.
One advantage that the dynamic algorithm provided in the link has is that in the same complexity class time it can recall different substring lengths in O(1); otherwise, it looks good to me.
This is a kind of thing will happen from time to time because storing subspace solutions will not always result in a better run time (on first call) and result in the same complexity class runtime instead (eg. try to compute the nth Fibonacci number with a dynamic solution and compare that to a tail recursive solution. Note that in this case like your case, after the array is filled the first time, its faster to return an answer each successive call.

Dynamic character generator; Generate all possible strings from a character set

I want to make a dynamic string generator that will generate all possible unique strings from a character set with a dynamic length.
I can make this very easily using for loops but then its static and not dynamic length.
// Prints all possible strings with the length of 3
for a in allowedCharacters {
for b in allowedCharacters {
for c in allowedCharacters {
println(a+b+c)
}
}
}
But when I want to make this dynamic of length so I can just call generate(length: 5) I get confused.
I found this Stackoverflow question But the accepted answer generates strings 1-maxLength length and I want maxLength on ever string.
As noted above, use recursion. Here is how it can be done with C#:
static IEnumerable<string> Generate(int length, char[] allowed_chars)
{
if (length == 1)
{
foreach (char c in allowed_chars)
yield return c.ToString();
}
else
{
var sub_strings = Generate(length - 1, allowed_chars);
foreach (char c in allowed_chars)
{
foreach (string sub in sub_strings)
{
yield return c + sub;
}
}
}
}
private static void Main(string[] args)
{
string chars = "abc";
List<string> result = Generate(3, chars.ToCharArray()).ToList();
}
Please note that the run time of this algorithm and the amount of data it returns is exponential as the length increases which means that if you have large lengths, you should expect the code to take a long time and to return a huge amount of data.
Translation of #YacoubMassad's C# code to Swift:
func generate(length: Int, allowedChars: [String]) -> [String] {
if length == 1 {
return allowedChars
}
else {
let subStrings = generate(length - 1, allowedChars: allowedChars)
var arr = [String]()
for c in allowedChars {
for sub in subStrings {
arr.append(c + sub)
}
}
return arr
}
}
println(generate(3, allowedChars: ["a", "b", "c"]))
Prints:
aaa, aab, aac, aba, abb, abc, aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc, caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc
While you can (obviously enough) use recursion to solve this problem, it quite an inefficient way to do the job.
What you're really doing is just counting. In your example, with "a", "b" and "c" as the allowed characters, you're counting in base 3, and since you're allowing three character strings, they're three digit numbers.
An N-digit number in base M can represent NM different possible values, going from 0 through NM-1. So, for your case, that's limit=pow(3, 3)-1;. To generate all those values, you just count from 0 through the limit, and convert each number to base M, using the specified characters as the "digits". For example, in C++ the code can look like this:
#include <string>
#include <iostream>
int main() {
std::string letters = "abc";
std::size_t base = letters.length();
std::size_t digits = 3;
int limit = pow(base, digits);
for (int i = 0; i < limit; i++) {
int in = i;
for (int j = 0; j < digits; j++) {
std::cout << letters[in%base];
in /= base;
}
std::cout << "\t";
}
}
One minor note: as I've written it here, this produces the output in basically a little-endian format. That is, the "digit" that varies the fastest is on the left, and the one that changes the slowest is on the right.

Determine number of char movement to get word

Suppose you are given a word
"sunflower"
You can perform only one operation type on it, pick a character and move it to the front.
So for instance if you picked 'f', the word would be "fsunlower".
You can have a series of these operations.
fsunlower (moved f to front)
wfsunloer (moved w to front)
fwsunloer (moved f to front again)
The problem is to get the minimum number of operations required, given the derived word and the original word. So if input strings are "fwsunloer", "sunflower", the output would be 3.
This problem is equivalent to : given String A and B, find the longest suffix of string A that is a sub-sequence of String B. Because, if we know which n - characters need to be moved, we will only need n steps. So what we need to find is the maximum number of character that don't need to be moved, which is equivalent to the longest suffix in A.
So for the given example, the longest suffix is sunlor
Java code:
public static void main(String[] args) {
System.out.println(minOp("ewfsunlor", "sunflower"));
}
public static int minOp(String A, String B) {
int n = A.length() - 1;//Start from the end of String A;
int pos = B.length();
int result = 0;
while (n >= 0) {
int nxt = -1;
for (int i = pos - 1; i >= 0; i--) {
if (B.charAt(i) == A.charAt(n)) {
nxt = i;
break;
}
}
if (nxt == -1) {
break;
}
result++;
pos = nxt;
n--;
}
return B.length() - result;
}
Result:
3
Time complexity O(n) with n is length of String A.
Note: this algorithm is based on an assumption that A and B contains same set of character. Otherwise, you need to check for that before using the function

Remove single character occurrence from String

I want an algorithm to remove all occurrences of a given character from a string in O(n) complexity or lower? (It should be INPLACE editing original string only)
eg.
String="aadecabaaab";
removeCharacter='a'
Output:"decbb"
Enjoy algo:
j = 0
for i in length(a):
if a[i] != symbol:
a[j] = a[i]
j = j + 1
finalize:
length(a) = j
You can't do it in place with a String because it's immutable, but here's an O(n) algorithm to do it in place with a char[]:
char[] chars = "aadecabaaab".toCharArray();
char removeCharacter = 'a';
int next = 0;
for (int cur = 0; cur < chars.length; ++cur) {
if (chars[cur] != removeCharacter) {
chars[next++] = chars[cur];
}
}
// chars[0] through chars[4] will have {d, e, c, b, b} and next will be 5
System.out.println(new String(chars, 0, next));
Strictly speaking, you can't remove anything from a String because the String class is immutable. But you can construct another String that has all characters from the original String except for the "character to remove".
Create a StringBuilder. Loop through all characters in the original String. If the current character is not the character to remove, then append it to the StringBuilder. After the loop ends, convert the StringBuilder to a String.
Yep. In a linear time, iterate over String, check using .charAt() if this is a removeCharacter, don't copy it to new String. If no, copy. That's it.
This probably shouldn't have the "java" tag since in Java, a String is immutable and you can't edit it in place. For a more general case, if you have an array of characters (in any programming language) and you want to modify the array "in place" without creating another array, it's easy enough to do with two indexes. One goes through every character in the array, and the other starts at the beginning and is incremented only when you see a character that isn't removeCharacter. Since I assume this is a homework assignment, I'll leave it at that and let you figure out the details.
import java.util.*;
import java.io.*;
public class removeA{
public static void main(String[] args){
String text = "This is a test string! Wow abcdefg.";
System.out.println(text.replaceAll("a",""));
}
}
Use a hash table to hold the data you want to remove. log N complexity.
std::string toRemove = "ad";
std::map<char, int> table;
size_t maxR = toRemove.size();
for (size_t n = 0; n < maxR; ++n)
{
table[toRemove[n]] = 0;
}
Then parse the whole string and remove when you get a hit (thestring is an array):
size_t counter = 0;
while(thestring[counter] != 0)
{
std::map<char,int>::iterator iter = table.find(thestring[counter]);
if (iter == table.end()) // we found a valid character!
{
++counter;
}
else
{
// move the data - dont increment counter
memcpy(&thestring[counter], &thestring[counter+1], max-counter);
// dont increment counter
}
}
EDIT: I hope this is not a technical test or something like that. =S

Is there a circular hash function?

Thinking about this question on testing string rotation, I wondered: Is there was such thing as a circular/cyclic hash function? E.g.
h(abcdef) = h(bcdefa) = h(cdefab) etc
Uses for this include scalable algorithms which can check n strings against each other to see where some are rotations of others.
I suppose the essence of the hash is to extract information which is order-specific but not position-specific. Maybe something that finds a deterministic 'first position', rotates to it and hashes the result?
It all seems plausible, but slightly beyond my grasp at the moment; it must be out there already...
I'd go along with your deterministic "first position" - find the "least" character; if it appears twice, use the next character as the tie breaker (etc). You can then rotate to a "canonical" position, and hash that in a normal way. If the tie breakers run for the entire course of the string, then you've got a string which is a rotation of itself (if you see what I mean) and it doesn't matter which you pick to be "first".
So:
"abcdef" => hash("abcdef")
"defabc" => hash("abcdef")
"abaac" => hash("aacab") (tie-break between aa, ac and ab)
"cabcab" => hash("abcabc") (it doesn't matter which "a" comes first!)
Update: As Jon pointed out, the first approach doesn't handle strings with repetition very well. Problems arise as duplicate pairs of letters are encountered and the resulting XOR is 0. Here is a modification that I believe fixes the the original algorithm. It uses Euclid-Fermat sequences to generate pairwise coprime integers for each additional occurrence of a character in the string. The result is that the XOR for duplicate pairs is non-zero.
I've also cleaned up the algorithm slightly. Note that the array containing the EF sequences only supports characters in the range 0x00 to 0xFF. This was just a cheap way to demonstrate the algorithm. Also, the algorithm still has runtime O(n) where n is the length of the string.
static int Hash(string s)
{
int H = 0;
if (s.Length > 0)
{
//any arbitrary coprime numbers
int a = s.Length, b = s.Length + 1;
//an array of Euclid-Fermat sequences to generate additional coprimes for each duplicate character occurrence
int[] c = new int[0xFF];
for (int i = 1; i < c.Length; i++)
{
c[i] = i + 1;
}
Func<char, int> NextCoprime = (x) => c[x] = (c[x] - x) * c[x] + x;
Func<char, char, int> NextPair = (x, y) => a * NextCoprime(x) * x.GetHashCode() + b * y.GetHashCode();
//for i=0 we need to wrap around to the last character
H = NextPair(s[s.Length - 1], s[0]);
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= NextPair(s[i - 1], s[i]);
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine("{0:X8}", Hash("abcdef"));
Console.WriteLine("{0:X8}", Hash("bcdefa"));
Console.WriteLine("{0:X8}", Hash("cdefab"));
Console.WriteLine("{0:X8}", Hash("cdfeab"));
Console.WriteLine("{0:X8}", Hash("a0a0"));
Console.WriteLine("{0:X8}", Hash("1010"));
Console.WriteLine("{0:X8}", Hash("0abc0def0ghi"));
Console.WriteLine("{0:X8}", Hash("0def0abc0ghi"));
}
The output is now:
7F7D7F7F
7F7D7F7F
7F7D7F7F
7F417F4F
C796C7F0
E090E0F0
A909BB71
A959BB71
First Version (which isn't complete): Use XOR which is commutative (order doesn't matter) and another little trick involving coprimes to combine ordered hashes of pairs of letters in the string. Here is an example in C#:
static int Hash(char[] s)
{
//any arbitrary coprime numbers
const int a = 7, b = 13;
int H = 0;
if (s.Length > 0)
{
//for i=0 we need to wrap around to the last character
H ^= (a * s[s.Length - 1].GetHashCode()) + (b * s[0].GetHashCode());
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= (a * s[i - 1].GetHashCode()) + (b * s[i].GetHashCode());
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine(Hash("abcdef".ToCharArray()));
Console.WriteLine(Hash("bcdefa".ToCharArray()));
Console.WriteLine(Hash("cdefab".ToCharArray()));
Console.WriteLine(Hash("cdfeab".ToCharArray()));
}
The output is:
4587590
4587590
4587590
7077996
You could find a deterministic first position by always starting at the position with the "lowest" (in terms of alphabetical ordering) substring. So in your case, you'd always start at "a". If there were multiple "a"s, you'd have to take two characters into account etc.
I am sure that you could find a function that can generate the same hash regardless of character position in the input, however, how will you ensure that h(abc) != h(efg) for every conceivable input? (Collisions will occur for all hash algorithms, so I mean, how do you minimize this risk.)
You'd need some additional checks even after generating the hash to ensure that the strings contain the same characters.
Here's an implementation using Linq
public string ToCanonicalOrder(string input)
{
char first = input.OrderBy(x => x).First();
string doubledForRotation = input + input;
string canonicalOrder
= (-1)
.GenerateFrom(x => doubledForRotation.IndexOf(first, x + 1))
.Skip(1) // the -1
.TakeWhile(x => x < input.Length)
.Select(x => doubledForRotation.Substring(x, input.Length))
.OrderBy(x => x)
.First();
return canonicalOrder;
}
assuming generic generator extension method:
public static class TExtensions
{
public static IEnumerable<T> GenerateFrom<T>(this T initial, Func<T, T> next)
{
var current = initial;
while (true)
{
yield return current;
current = next(current);
}
}
}
sample usage:
var sequences = new[]
{
"abcdef", "bcdefa", "cdefab",
"defabc", "efabcd", "fabcde",
"abaac", "cabcab"
};
foreach (string sequence in sequences)
{
Console.WriteLine(ToCanonicalOrder(sequence));
}
output:
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
aacab
abcabc
then call .GetHashCode() on the result if necessary.
sample usage if ToCanonicalOrder() is converted to an extension method:
sequence.ToCanonicalOrder().GetHashCode();
One possibility is to combine the hash functions of all circular shifts of your input into one meta-hash which does not depend on the order of the inputs.
More formally, consider
for(int i=0; i<string.length; i++) {
result^=string.rotatedBy(i).hashCode();
}
Where you could replace the ^= with any other commutative operation.
More examply, consider the input
"abcd"
to get the hash we take
hash("abcd") ^ hash("dabc") ^ hash("cdab") ^ hash("bcda").
As we can see, taking the hash of any of these permutations will only change the order that you are evaluating the XOR, which won't change its value.
I did something like this for a project in college. There were 2 approaches I used to try to optimize a Travelling-Salesman problem. I think if the elements are NOT guaranteed to be unique, the second solution would take a bit more checking, but the first one should work.
If you can represent the string as a matrix of associations so abcdef would look like
a b c d e f
a x
b x
c x
d x
e x
f x
But so would any combination of those associations. It would be trivial to compare those matrices.
Another quicker trick would be to rotate the string so that the "first" letter is first. Then if you have the same starting point, the same strings will be identical.
Here is some Ruby code:
def normalize_string(string)
myarray = string.split(//) # split into an array
index = myarray.index(myarray.min) # find the index of the minimum element
index.times do
myarray.push(myarray.shift) # move stuff from the front to the back
end
return myarray.join
end
p normalize_string('abcdef').eql?normalize_string('defabc') # should return true
Maybe use a rolling hash for each offset (RabinKarp like) and return the minimum hash value? There could be collisions though.

Resources