Time complexity of a string compression algorithm - string

I tried to do an exercice from cracking the code interview book about compressing a string :
Implement a method to perform basic string compression using the counts of
repeated characters. For example, the string aabcccccaaa would become
a2blc5a3. If the "compressed" string would not become smaller than the original
string, your method should return the original string.
The first proposition given by the author is the following :
public static String compressBad(String str) {
int size = countCompression(str);
if (size >= str.length()) {
return str;
}
String mystr = "";
char last = str.charAt(0);
int count = 1;
for (int i = 1; i < str.length(); i++) {
if (str.charAt(i) == last) {
count++;
} else {
mystr += last + "" + count;
last = str.charAt(i);
count = 1;
}
}
return mystr + last + count;
}
About the time complexity of thie algorithm, she said that :
The runtime is 0(p + k^2), where p is the size of the original string and k is the number of character sequences. For example, if the string is aabccdeeaa, then there are six character sequences. It's slow because string concatenation operates in 0(n^2) time
I have two main questions :
1) Why the time complexity is O(p+k^2), it should be only O(p) (where p is the size of the string) no? Because we do only p iterations not p + k^2.
2) Why the time complexity of a string concatenation is 0(n^2)???
I thought that if we have a string3 = string1 + string2 then we have a complexity of size(string1) + size(string2), because we have to create a copy of the two strings before adding them to a new string (create a string is O(1)). So, we will have an addition not a multiplication, no? It isn't the same thing for an array (if we used a char array for example) ?
Can you clarify these points please? I didn't understand how we calculate the complexity...

String concatenation is O(n) but in this case it's being concatenated K times.
Every time you find a sequence you have to copy the entire string + what you found. for example if you had four total sequences in the original string
the cost to get the final string will be:
(k-3)+ //first sequence
(k-3)+(k-2) + //copying previous string and adding new sequence
((k-3)+(+k-2)+k(-1) + //copying previous string and adding new sequence
((k-3)+(k-2)+(k-1)+k) //copying previous string and adding new sequence
Thus complexity is O(p + K^2)

Related

Char Array Returning Integers

I've been working through this exercise, and my output is not what I expect.
(Check substrings) You can check whether a string is a substring of another string
by using the indexOf method in the String class. Write your own method for
this function. Write a program that prompts the user to enter two strings, and
checks whether the first string is a substring of the second.
** My code compromises with the problem's specifications in two ways: it can only display matching substrings to 3 letters, and it cannot work on string literals with less than 4 letters. I mistakenly began writing the program without using the suggested method, indexOf. My program's objective (although it shouldn't entirely deviate from the assignment's objective) is to design a program that determines whether two strings share at least three consecutive letters.
The program's primary error is that it generates numbers instead of char characters. I've run through several, unsuccessful ideas to discover what the logical error is. I first tried to idenfity whether the char characters (which, from my understanding, are underwritten in unicode) were converted to integers, considering that the outputted numbers are also three letters long. Without consulting a reference, I know this isn't true. A comparison between java and javac outputted permutation of 312, and a comparison between abab and ababbab ouputted combinations of 219. j should be > b. My next thought was that the ouputs were indexes of the arrays I used. Once again, this isn't true. A comparison between java and javac would ouput 0, if my reasoning were true.
public class Substring {
public static char [] array;
public static char [] array2;
public static void main (String[]args){
java.util.Scanner input = new java.util.Scanner (System.in);
System.out.println("Enter your two strings here, the longer one preceding the shorter one");
String container1 = input.next();
String container2 = input.next();
char [] placeholder = container1.toCharArray();
char [] placeholder2 = container2.toCharArray();
array = placeholder;
array2 = placeholder2;
for (int i = 0; i < placeholder2.length; i++){
for (int j = 0; j < placeholder.length; j ++){
if (array[j] == array2[i]) matcher(j,i);
}
}
}
public static void matcher(int higher, int lower){
if ((higher < array.length - 2) && (lower < array2.length - 2))
if (( array[higher+1] == array2[lower+1]) && (array[higher+2] == array2[lower+2]))
System.out.println(array[higher] + array[higher+1] + array[higher+2] );
}
}
The + operator promotes shorts, chars, and bytes operands to ints, so
array[higher] + array[higher+1] + array[higher+2]
has type int, not type char which means that
System.out.println(...)
binds to
System.out.println(int)
which displays its argument as a decimal number, instead of binding to
System.out.println(char)
which outputs the given character using the PrintStream's encoding.

Remove all the occurences of substrings from a string

Given a string S and a set of n substrings. Remove every instance of those n substrings from S so that S is of the minimum length and output this minimum length.
Example 1
S = ccdaabcdbb
n = 2
substrings = ab, cd
Output
2
Explanation:
ccdaabcdbb -> ccdacdbb -> cabb -> cb (length=2)
Example 2
S = abcd
n = 2
substrings = ab,bcd
Output
1
How do I solve this problem ?
A simple Brute-force search algorithm is:
For each substring, try all possible ways to remove it from the string, then recurse.
In Pseudocode:
def min_final_length (input, substrings):
best = len(input)
for substr in substrings:
beg = 0
// find all occurrences of substr in input and recurse
while (found = find_substring(input, substr, from=beg)):
input_without_substr = input[0:found]+input[found+len(substr):len(input)]
best = min(best, min_final_length(input_without_substr,substrings))
beg = found+1
return best
Let complexity be F(S,n,l) where S is the length of the input string, n is the cardinality of the set substrings and l is the "characteristic length" of substrings. Then
F(S,n,l) ~ n * ( S * l + F(S-l,n,l) )
Looks like it is at most O(S^2*n*l).
The following solution would have an complexity of O(m * n) where m = len(S) and n is the number of substring
def foo(S, sub):
i = 0
while i < len(S):
for e in sub:
if S[i:].startswith(e):
S = S[:i] + S[i+len(e):]
i -= 1
break
else: i += 1
return S, i
If you are for raw performance and your string is very large, you can do better than brute force. Use a suffix trie (E.g, Ukkonnen trie) to store your string. Then find each substring (which us done in O(m) time, m being substring length), and store the offsets to the substrings and length in an array.
Then use the offsets and length info to actually remove the substrings by filling these areas with \0 (in C) or another placeholder character. By counting all non-Null characters you will get the minimal length of the string.
This will als handle overlapping substring, e.g. say your string is "abcd", and you have two substrings "ab" and "abcd".
I solved it using trie+dp.
First insert your substrings in a trie. Then define the state of the dp is some string, walk through that string and consider each i (for i =0 .. s.length()) as the start of some substring. let j=i and increment j as long as you have a suffix in the trie (which will definitely land you to at least one substring and may be more if you have common suffix between some substring, for example "abce" and "abdd"), whenever you encounter an end of some substring, go solve the new sub-problem and find the minimum between all substring reductions.
Here is my code for it. Don't worry about the length of the code. Just read the solve function and forget about the path, I included it to print the string formed.
struct node{
node* c[26];
bool str_end;
node(){
for(int i= 0;i<26;i++){
c[i]=NULL;
}
str_end= false;
}
};
class Trie{
public:
node* root;
Trie(){
root = new node();
}
~Trie(){
delete root;
}
};
class Solution{
public:
typedef pair<int,int>ii;
string get_str(string& s,map<string,ii>&path){
if(!path.count(s)){
return s;
}
int i= path[s].first;
int j= path[s].second;
string new_str =(s.substr(0,i)+s.substr(j+1));
return get_str(new_str,path);
}
int solve(string& s,Trie* &t, map<string,int>&dp,map<string,ii>&path){
if(dp.count(s)){
return dp[s];
}
int mn= (int)s.length();
for(int i =0;i<s.length();i++){
string left = s.substr(0,i);
node* cur = t->root->c[s[i]-97];
int j=i;
while(j<s.length()&&cur!=NULL){
if(cur->str_end){
string new_str =left+s.substr(j+1);
int ret= solve(new_str,t,dp,path);
if(ret<mn){
path[s]={i,j};
}
}
cur = cur->c[s[++j]-97];
}
}
return dp[s]=mn;
}
string removeSubstrings(vector<string>& substrs, string s){
map<string,ii>path;
map<string,int>dp;
Trie*t = new Trie();
for(int i =0;i<substrs.size();i++){
node* cur = t->root;
for(int j=0;j<substrs[i].length();j++){
if(cur->c[substrs[i][j]-97]==NULL){
cur->c[substrs[i][j]-97]= new node();
}
cur = cur->c[substrs[i][j]-97];
if(j==substrs[i].length()-1){
cur->str_end= true;
}
}
}
solve(s,t,dp,path);
return get_str(s, path);
}
};
int main(){
vector<string>substrs;
substrs.push_back("ab");
substrs.push_back("cd");
Solution s;
cout << s.removeSubstrings(substrs,"ccdaabcdbb")<<endl;
return 0;
}

dart efficient string processing techniques?

I strings in the format of name:key:dataLength:data and these strings can often be chained together. for example "aNum:n:4:9879aBool:b:1:taString:s:2:Hi" this would map to an object something like:
{
aNum: 9879,
aBool: true,
aString: "Hi"
}
I have a method for parsing a string in this format but I'm not sure whether it's use of substring is the most efficient way of pprocessing the string, is there a more efficient way of processing strings in this fashion (repeatedly chopping off the front section):
Map<string, dynamic> fromString(String s){
Map<String, dynamic> _internal = new Map();
int start = 0;
while(start < s.length){
int end;
List<String> parts = new List<String>(); //0 is name, 1 is key, 2 is data length, 3 is data
for(var i = 0; i < 4; i++){
end = i < 3 ? s.indexOf(':') : num.parse(parts[2]);
parts[i] = s.substring(start, end);
start = i < 3 ? end + 1 : end;
}
var tranType = _tranTypesByKey[parts[1]]; //this is just a map to an object which has a function that can convert the data section of the string into an object
_internal[parts[0]] = tranType._fromStr(parts[3]);
}
return _internal;
}
I would try s.split(':') and process the resulting list.
If you do a lot of such operations you should consider creating benchmarks tests, try different techniques and compare them.
If you would still need this line
s = i < 3 ? s.substring(idx + 1) : s.substring(idx);
I would avoid creating a new substring in each iteration but instead just keep track of the next position.
You have to decide how important performance is relative to readability and maintainability of the code.
That said, you should not be cutting off the head of the string repeatedly. That is guaranteed to be inefficient - it'll take time that is quadratic in the number of records in your string, just creating those tail strings.
For parsing each field, you can avoid doing substrings on the length and type fields. For the length field, you can build the number yourself:
int index = ...;
// index points to first digit of length.
int length = 0;
int charCode = source.codeUnitAt(index++);
while (charCode != CHAR_COLON) {
length = 10 * length + charCode - 0x30;
charCode = source.codeUnitAt(index++);
}
// index points to the first character of content.
Since lengths are usually small integers (less than 2<<31), this is likely to be more efficient than creating a substring and calling int.parse.
The type field is a single ASCII character, so you could use codeUnitAt to get its ASCII value instead of creating a single-character string (and then your content interpretation lookup will need to switch on character code instead of character string).
For parsing content, you could pass the source string, start index and length instead of creating a substring. Then the boolean parser can also just read the code unit instead of the singleton character string, the string parser can just make the substring, and the number parser will likely have to make a substring too and call double.parse.
It would be convenient if Dart had a double.parseSubstring(source, [int from = 0, int to]) that could parse a substring as a double without creating the substring.

Remove single character occurrence from String

I want an algorithm to remove all occurrences of a given character from a string in O(n) complexity or lower? (It should be INPLACE editing original string only)
eg.
String="aadecabaaab";
removeCharacter='a'
Output:"decbb"
Enjoy algo:
j = 0
for i in length(a):
if a[i] != symbol:
a[j] = a[i]
j = j + 1
finalize:
length(a) = j
You can't do it in place with a String because it's immutable, but here's an O(n) algorithm to do it in place with a char[]:
char[] chars = "aadecabaaab".toCharArray();
char removeCharacter = 'a';
int next = 0;
for (int cur = 0; cur < chars.length; ++cur) {
if (chars[cur] != removeCharacter) {
chars[next++] = chars[cur];
}
}
// chars[0] through chars[4] will have {d, e, c, b, b} and next will be 5
System.out.println(new String(chars, 0, next));
Strictly speaking, you can't remove anything from a String because the String class is immutable. But you can construct another String that has all characters from the original String except for the "character to remove".
Create a StringBuilder. Loop through all characters in the original String. If the current character is not the character to remove, then append it to the StringBuilder. After the loop ends, convert the StringBuilder to a String.
Yep. In a linear time, iterate over String, check using .charAt() if this is a removeCharacter, don't copy it to new String. If no, copy. That's it.
This probably shouldn't have the "java" tag since in Java, a String is immutable and you can't edit it in place. For a more general case, if you have an array of characters (in any programming language) and you want to modify the array "in place" without creating another array, it's easy enough to do with two indexes. One goes through every character in the array, and the other starts at the beginning and is incremented only when you see a character that isn't removeCharacter. Since I assume this is a homework assignment, I'll leave it at that and let you figure out the details.
import java.util.*;
import java.io.*;
public class removeA{
public static void main(String[] args){
String text = "This is a test string! Wow abcdefg.";
System.out.println(text.replaceAll("a",""));
}
}
Use a hash table to hold the data you want to remove. log N complexity.
std::string toRemove = "ad";
std::map<char, int> table;
size_t maxR = toRemove.size();
for (size_t n = 0; n < maxR; ++n)
{
table[toRemove[n]] = 0;
}
Then parse the whole string and remove when you get a hit (thestring is an array):
size_t counter = 0;
while(thestring[counter] != 0)
{
std::map<char,int>::iterator iter = table.find(thestring[counter]);
if (iter == table.end()) // we found a valid character!
{
++counter;
}
else
{
// move the data - dont increment counter
memcpy(&thestring[counter], &thestring[counter+1], max-counter);
// dont increment counter
}
}
EDIT: I hope this is not a technical test or something like that. =S

Sorting a string using another sorting order string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I saw this in an interview question ,
Given a sorting order string, you are asked to sort the input string based on the given sorting order string.
for example if the sorting order string is dfbcae
and the Input string is abcdeeabc
the output should be dbbccaaee.
any ideas on how to do this , in an efficient way ?
The Counting Sort option is pretty cool, and fast when the string to be sorted is long compared to the sort order string.
create an array where each index corresponds to a letter in the alphabet, this is the count array
for each letter in the sort target, increment the index in the count array which corresponds to that letter
for each letter in the sort order string
add that letter to the end of the output string a number of times equal to it's count in the count array
Algorithmic complexity is O(n) where n is the length of the string to be sorted. As the Wikipedia article explains we're able to beat the lower bound on standard comparison based sorting because this isn't a comparison based sort.
Here's some pseudocode.
char[26] countArray;
foreach(char c in sortTarget)
{
countArray[c - 'a']++;
}
int head = 0;
foreach(char c in sortOrder)
{
while(countArray[c - 'a'] > 0)
{
sortTarget[head] = c;
head++;
countArray[c - 'a']--;
}
}
Note: this implementation requires that both strings contain only lowercase characters.
Here's a nice easy to understand algorithm that has decent algorithmic complexity.
For each character in the sort order string
scan string to be sorted, starting at first non-ordered character (you can keep track of this character with an index or pointer)
when you find an occurrence of the specified character, swap it with the first non-ordered character
increment the index for the first non-ordered character
This is O(n*m), where n is the length of the string to be sorted and m is the length of the sort order string. We're able to beat the lower bound on comparison based sorting because this algorithm doesn't really use comparisons. Like Counting Sort it relies on the fact that you have a predefined finite external ordering set.
Here's some psuedocode:
int head = 0;
foreach(char c in sortOrder)
{
for(int i = head; i < sortTarget.length; i++)
{
if(sortTarget[i] == c)
{
// swap i with head
char temp = sortTarget[head];
sortTarget[head] = sortTarget[i];
sortTarget[i] = temp;
head++;
}
}
}
In Python, you can just create an index and use that in a comparison expression:
order = 'dfbcae'
input = 'abcdeeabc'
index = dict([ (y,x) for (x,y) in enumerate(order) ])
output = sorted(input, cmp=lambda x,y: index[x] - index[y])
print 'input=',''.join(input)
print 'output=',''.join(output)
gives this output:
input= abcdeeabc
output= dbbccaaee
Use binary search to find all the "split points" between different letters, then use the length of each segment directly. This will be asymptotically faster then naive counting sort, but will be harder to implement:
Use an array of size 26*2 to store the begin and end of each letter;
Inspect the middle element, see if it is different from the element left to it. If so, then this is the begin for the middle element and end for the element before it;
Throw away the segment with identical begin and end (if there are any), recursively apply this algorithm.
Since there are at most 25 "split"s, you won't have to do the search for more than 25 segemnts, and for each segment it is O(logn). Since this is constant * O(logn), the algorithm is O(nlogn).
And of course, just use counting sort will be easier to implement:
Use an array of size 26 to record the number of different letters;
Scan the input string;
Output the string in the given sorting order.
This is O(n), n being the length of the string.
Interview questions are generally about thought process and don't usually care too much about language features, but I couldn't resist posting a VB.Net 4.0 version anyway.
"Efficient" can mean two different things. The first is "what's the fastest way to make a computer execute a task" and the second is "what's the fastest that we can get a task done". They might sound the same but the first can mean micro-optimizations like int vs short, running timers to compare execution times and spending a week tweaking every millisecond out of an algorithm. The second definition is about how much human time would it take to create the code that does the task (hopefully in a reasonable amount of time). If code A runs 20 times faster than code B but code B took 1/20th of the time to write, depending on the granularity of the timer (1ms vs 20ms, 1 week vs 20 weeks), each version could be considered "efficient".
Dim input = "abcdeeabc"
Dim sort = "dfbcae"
Dim SortChars = sort.ToList()
Dim output = New String((From c In input.ToList() Select c Order By SortChars.IndexOf(c)).ToArray())
Trace.WriteLine(output)
Here is my solution to the question
import java.util.*;
import java.io.*;
class SortString
{
public static void main(String arg[])throws IOException
{
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
// System.out.println("Enter 1st String :");
// System.out.println("Enter 1st String :");
// String s1=br.readLine();
// System.out.println("Enter 2nd String :");
// String s2=br.readLine();
String s1="tracctor";
String s2="car";
String com="";
String uncom="";
for(int i=0;i<s2.length();i++)
{
if(s1.contains(""+s2.charAt(i)))
{
com=com+s2.charAt(i);
}
}
System.out.println("Com :"+com);
for(int i=0;i<s1.length();i++)
if(!com.contains(""+s1.charAt(i)))
uncom=uncom+s1.charAt(i);
System.out.println("Uncom "+uncom);
System.out.println("Combined "+(com+uncom));
HashMap<String,Integer> h1=new HashMap<String,Integer>();
for(int i=0;i<s1.length();i++)
{
String m=""+s1.charAt(i);
if(h1.containsKey(m))
{
int val=(int)h1.get(m);
val=val+1;
h1.put(m,val);
}
else
{
h1.put(m,new Integer(1));
}
}
StringBuilder x=new StringBuilder();
for(int i=0;i<com.length();i++)
{
if(h1.containsKey(""+com.charAt(i)))
{
int count=(int)h1.get(""+com.charAt(i));
while(count!=0)
{x.append(""+com.charAt(i));count--;}
}
}
x.append(uncom);
System.out.println("Sort "+x);
}
}
Here is my version which is O(n) in time. Instead of unordered_map, I could have just used a char array of constant size. i.,e. char char_count[256] (and done ++char_count[ch - 'a'] ) assuming the input strings has all ASCII small characters.
string SortOrder(const string& input, const string& sort_order) {
unordered_map<char, int> char_count;
for (auto ch : input) {
++char_count[ch];
}
string res = "";
for (auto ch : sort_order) {
unordered_map<char, int>::iterator it = char_count.find(ch);
if (it != char_count.end()) {
string s(it->second, it->first);
res += s;
}
}
return res;
}
private static String sort(String target, String reference) {
final Map<Character, Integer> referencesMap = new HashMap<Character, Integer>();
for (int i = 0; i < reference.length(); i++) {
char key = reference.charAt(i);
if (!referencesMap.containsKey(key)) {
referencesMap.put(key, i);
}
}
List<Character> chars = new ArrayList<Character>(target.length());
for (int i = 0; i < target.length(); i++) {
chars.add(target.charAt(i));
}
Collections.sort(chars, new Comparator<Character>() {
#Override
public int compare(Character o1, Character o2) {
return referencesMap.get(o1).compareTo(referencesMap.get(o2));
}
});
StringBuilder sb = new StringBuilder();
for (Character c : chars) {
sb.append(c);
}
return sb.toString();
}
In C# I would just use the IComparer Interface and leave it to Array.Sort
void Main()
{
// we defin the IComparer class to define Sort Order
var sortOrder = new SortOrder("dfbcae");
var testOrder = "abcdeeabc".ToCharArray();
// sort the array using Array.Sort
Array.Sort(testOrder, sortOrder);
Console.WriteLine(testOrder.ToString());
}
public class SortOrder : IComparer
{
string sortOrder;
public SortOrder(string sortOrder)
{
this.sortOrder = sortOrder;
}
public int Compare(object obj1, object obj2)
{
var obj1Index = sortOrder.IndexOf((char)obj1);
var obj2Index = sortOrder.IndexOf((char)obj2);
if(obj1Index == -1 || obj2Index == -1)
{
throw new Exception("character not found");
}
if(obj1Index > obj2Index)
{
return 1;
}
else if (obj1Index == obj2Index)
{
return 0;
}
else
{
return -1;
}
}
}

Resources