Sorting a string using another sorting order string [closed] - string

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I saw this in an interview question ,
Given a sorting order string, you are asked to sort the input string based on the given sorting order string.
for example if the sorting order string is dfbcae
and the Input string is abcdeeabc
the output should be dbbccaaee.
any ideas on how to do this , in an efficient way ?

The Counting Sort option is pretty cool, and fast when the string to be sorted is long compared to the sort order string.
create an array where each index corresponds to a letter in the alphabet, this is the count array
for each letter in the sort target, increment the index in the count array which corresponds to that letter
for each letter in the sort order string
add that letter to the end of the output string a number of times equal to it's count in the count array
Algorithmic complexity is O(n) where n is the length of the string to be sorted. As the Wikipedia article explains we're able to beat the lower bound on standard comparison based sorting because this isn't a comparison based sort.
Here's some pseudocode.
char[26] countArray;
foreach(char c in sortTarget)
{
countArray[c - 'a']++;
}
int head = 0;
foreach(char c in sortOrder)
{
while(countArray[c - 'a'] > 0)
{
sortTarget[head] = c;
head++;
countArray[c - 'a']--;
}
}
Note: this implementation requires that both strings contain only lowercase characters.

Here's a nice easy to understand algorithm that has decent algorithmic complexity.
For each character in the sort order string
scan string to be sorted, starting at first non-ordered character (you can keep track of this character with an index or pointer)
when you find an occurrence of the specified character, swap it with the first non-ordered character
increment the index for the first non-ordered character
This is O(n*m), where n is the length of the string to be sorted and m is the length of the sort order string. We're able to beat the lower bound on comparison based sorting because this algorithm doesn't really use comparisons. Like Counting Sort it relies on the fact that you have a predefined finite external ordering set.
Here's some psuedocode:
int head = 0;
foreach(char c in sortOrder)
{
for(int i = head; i < sortTarget.length; i++)
{
if(sortTarget[i] == c)
{
// swap i with head
char temp = sortTarget[head];
sortTarget[head] = sortTarget[i];
sortTarget[i] = temp;
head++;
}
}
}

In Python, you can just create an index and use that in a comparison expression:
order = 'dfbcae'
input = 'abcdeeabc'
index = dict([ (y,x) for (x,y) in enumerate(order) ])
output = sorted(input, cmp=lambda x,y: index[x] - index[y])
print 'input=',''.join(input)
print 'output=',''.join(output)
gives this output:
input= abcdeeabc
output= dbbccaaee

Use binary search to find all the "split points" between different letters, then use the length of each segment directly. This will be asymptotically faster then naive counting sort, but will be harder to implement:
Use an array of size 26*2 to store the begin and end of each letter;
Inspect the middle element, see if it is different from the element left to it. If so, then this is the begin for the middle element and end for the element before it;
Throw away the segment with identical begin and end (if there are any), recursively apply this algorithm.
Since there are at most 25 "split"s, you won't have to do the search for more than 25 segemnts, and for each segment it is O(logn). Since this is constant * O(logn), the algorithm is O(nlogn).
And of course, just use counting sort will be easier to implement:
Use an array of size 26 to record the number of different letters;
Scan the input string;
Output the string in the given sorting order.
This is O(n), n being the length of the string.

Interview questions are generally about thought process and don't usually care too much about language features, but I couldn't resist posting a VB.Net 4.0 version anyway.
"Efficient" can mean two different things. The first is "what's the fastest way to make a computer execute a task" and the second is "what's the fastest that we can get a task done". They might sound the same but the first can mean micro-optimizations like int vs short, running timers to compare execution times and spending a week tweaking every millisecond out of an algorithm. The second definition is about how much human time would it take to create the code that does the task (hopefully in a reasonable amount of time). If code A runs 20 times faster than code B but code B took 1/20th of the time to write, depending on the granularity of the timer (1ms vs 20ms, 1 week vs 20 weeks), each version could be considered "efficient".
Dim input = "abcdeeabc"
Dim sort = "dfbcae"
Dim SortChars = sort.ToList()
Dim output = New String((From c In input.ToList() Select c Order By SortChars.IndexOf(c)).ToArray())
Trace.WriteLine(output)

Here is my solution to the question
import java.util.*;
import java.io.*;
class SortString
{
public static void main(String arg[])throws IOException
{
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
// System.out.println("Enter 1st String :");
// System.out.println("Enter 1st String :");
// String s1=br.readLine();
// System.out.println("Enter 2nd String :");
// String s2=br.readLine();
String s1="tracctor";
String s2="car";
String com="";
String uncom="";
for(int i=0;i<s2.length();i++)
{
if(s1.contains(""+s2.charAt(i)))
{
com=com+s2.charAt(i);
}
}
System.out.println("Com :"+com);
for(int i=0;i<s1.length();i++)
if(!com.contains(""+s1.charAt(i)))
uncom=uncom+s1.charAt(i);
System.out.println("Uncom "+uncom);
System.out.println("Combined "+(com+uncom));
HashMap<String,Integer> h1=new HashMap<String,Integer>();
for(int i=0;i<s1.length();i++)
{
String m=""+s1.charAt(i);
if(h1.containsKey(m))
{
int val=(int)h1.get(m);
val=val+1;
h1.put(m,val);
}
else
{
h1.put(m,new Integer(1));
}
}
StringBuilder x=new StringBuilder();
for(int i=0;i<com.length();i++)
{
if(h1.containsKey(""+com.charAt(i)))
{
int count=(int)h1.get(""+com.charAt(i));
while(count!=0)
{x.append(""+com.charAt(i));count--;}
}
}
x.append(uncom);
System.out.println("Sort "+x);
}
}

Here is my version which is O(n) in time. Instead of unordered_map, I could have just used a char array of constant size. i.,e. char char_count[256] (and done ++char_count[ch - 'a'] ) assuming the input strings has all ASCII small characters.
string SortOrder(const string& input, const string& sort_order) {
unordered_map<char, int> char_count;
for (auto ch : input) {
++char_count[ch];
}
string res = "";
for (auto ch : sort_order) {
unordered_map<char, int>::iterator it = char_count.find(ch);
if (it != char_count.end()) {
string s(it->second, it->first);
res += s;
}
}
return res;
}

private static String sort(String target, String reference) {
final Map<Character, Integer> referencesMap = new HashMap<Character, Integer>();
for (int i = 0; i < reference.length(); i++) {
char key = reference.charAt(i);
if (!referencesMap.containsKey(key)) {
referencesMap.put(key, i);
}
}
List<Character> chars = new ArrayList<Character>(target.length());
for (int i = 0; i < target.length(); i++) {
chars.add(target.charAt(i));
}
Collections.sort(chars, new Comparator<Character>() {
#Override
public int compare(Character o1, Character o2) {
return referencesMap.get(o1).compareTo(referencesMap.get(o2));
}
});
StringBuilder sb = new StringBuilder();
for (Character c : chars) {
sb.append(c);
}
return sb.toString();
}

In C# I would just use the IComparer Interface and leave it to Array.Sort
void Main()
{
// we defin the IComparer class to define Sort Order
var sortOrder = new SortOrder("dfbcae");
var testOrder = "abcdeeabc".ToCharArray();
// sort the array using Array.Sort
Array.Sort(testOrder, sortOrder);
Console.WriteLine(testOrder.ToString());
}
public class SortOrder : IComparer
{
string sortOrder;
public SortOrder(string sortOrder)
{
this.sortOrder = sortOrder;
}
public int Compare(object obj1, object obj2)
{
var obj1Index = sortOrder.IndexOf((char)obj1);
var obj2Index = sortOrder.IndexOf((char)obj2);
if(obj1Index == -1 || obj2Index == -1)
{
throw new Exception("character not found");
}
if(obj1Index > obj2Index)
{
return 1;
}
else if (obj1Index == obj2Index)
{
return 0;
}
else
{
return -1;
}
}
}

Related

I need to create a function in Groovy that has a single integer as a parameter and returns the number of significant figures it contains

Long story short, I'm working in a system that only works with groovy in its expression editor, and I need to create a function that returns the number of significant figures an integer has. I've found the following function in stack overflow for Java, however it doesnt seem like groovy (or the system itself) likes the regex:
String myfloat = "0.0120";
String [] sig_figs = myfloat.split("(^0+(\\.?)0*|(~\\.)0+$|\\.)");
int sum = 0;
for (String fig : sig_figs)
{
sum += fig.length();
}
return sum;
I've since tried to convert it into a more Groovy-esque syntax to be compatible, and have produced the following:
def sum = 0;
def myint = toString(mynum);
def String[] sig_figs = myint.split(/[^0+(\\.?)0*|(~\\.)0+$|\\.]/);
for (int i = 0; i <= sig_figs.size();i++)
{
sum += sig_figs[i].length();
}
return(sum);
Note that 'mynum' is the parameter of the method
It should also be noted that this system has very little visibility in regards to what groovy functions are available in the system, so the solution likely needs to be as basic as possible
Any help would be greatly appreciated. Thanks!
I think this is the regex you need:
def num = '0.0120'
def splitted = num.split(/(^0+(\.?)0*|(~\.)0+$|\.)/)
def sf = splitted*.length().sum()
It's been a while since I've had to think about significant figures, so sorry if I have the wrong idea. But I've made two regular expressions that combined should count the number of significant figures (sorry I'm no regex wizard) in a string representing a decimal. It doesn't handle commas, you would have to strip those out.
This first regex matches all significant figures before the decimal point
([1-9]+\d*[1-9]|[1-9]+)
And this second regex matches all significant figures after the decimal point:
\.((\d*[1-9]+)+)?
If you add up the lengths of the first capture group (or 0 when no match) for both matches, then it should give you the number of significant figures.
Example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class SigFigs {
private static final Pattern pattern1 = Pattern.compile("([1-9]+\\d*[1-9]|[1-9]+)");
private static final Pattern pattern2 = Pattern.compile("\\.((\\d*[1-9]+)+)?");
public static int getSignificantFigures(String number) {
int sigFigs = 0;
for (int i=0; i < 2; i++) {
Matcher matcher = (i == 0 ? pattern1 : pattern2).matcher(number);
if (matcher.find()) {
try {
String s = matcher.group(1);
if (s != null) sigFigs += s.length();
} catch (IndexOutOfBoundsException ignored) { }
}
}
return sigFigs;
}
public static void main(String[] args) {
System.out.println(getSignificantFigures("0305.44090")); // 7 sig. figs
}
}
Of course using two matches is suboptimal (like I've said, I'm not crazy good at regex like some I could mention) but its fairly robust and readable

Dynamic character generator; Generate all possible strings from a character set

I want to make a dynamic string generator that will generate all possible unique strings from a character set with a dynamic length.
I can make this very easily using for loops but then its static and not dynamic length.
// Prints all possible strings with the length of 3
for a in allowedCharacters {
for b in allowedCharacters {
for c in allowedCharacters {
println(a+b+c)
}
}
}
But when I want to make this dynamic of length so I can just call generate(length: 5) I get confused.
I found this Stackoverflow question But the accepted answer generates strings 1-maxLength length and I want maxLength on ever string.
As noted above, use recursion. Here is how it can be done with C#:
static IEnumerable<string> Generate(int length, char[] allowed_chars)
{
if (length == 1)
{
foreach (char c in allowed_chars)
yield return c.ToString();
}
else
{
var sub_strings = Generate(length - 1, allowed_chars);
foreach (char c in allowed_chars)
{
foreach (string sub in sub_strings)
{
yield return c + sub;
}
}
}
}
private static void Main(string[] args)
{
string chars = "abc";
List<string> result = Generate(3, chars.ToCharArray()).ToList();
}
Please note that the run time of this algorithm and the amount of data it returns is exponential as the length increases which means that if you have large lengths, you should expect the code to take a long time and to return a huge amount of data.
Translation of #YacoubMassad's C# code to Swift:
func generate(length: Int, allowedChars: [String]) -> [String] {
if length == 1 {
return allowedChars
}
else {
let subStrings = generate(length - 1, allowedChars: allowedChars)
var arr = [String]()
for c in allowedChars {
for sub in subStrings {
arr.append(c + sub)
}
}
return arr
}
}
println(generate(3, allowedChars: ["a", "b", "c"]))
Prints:
aaa, aab, aac, aba, abb, abc, aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc, caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc
While you can (obviously enough) use recursion to solve this problem, it quite an inefficient way to do the job.
What you're really doing is just counting. In your example, with "a", "b" and "c" as the allowed characters, you're counting in base 3, and since you're allowing three character strings, they're three digit numbers.
An N-digit number in base M can represent NM different possible values, going from 0 through NM-1. So, for your case, that's limit=pow(3, 3)-1;. To generate all those values, you just count from 0 through the limit, and convert each number to base M, using the specified characters as the "digits". For example, in C++ the code can look like this:
#include <string>
#include <iostream>
int main() {
std::string letters = "abc";
std::size_t base = letters.length();
std::size_t digits = 3;
int limit = pow(base, digits);
for (int i = 0; i < limit; i++) {
int in = i;
for (int j = 0; j < digits; j++) {
std::cout << letters[in%base];
in /= base;
}
std::cout << "\t";
}
}
One minor note: as I've written it here, this produces the output in basically a little-endian format. That is, the "digit" that varies the fastest is on the left, and the one that changes the slowest is on the right.

Count the number of frequency for different characters in a string

i am currently tried to create a small program were the user enter a string in a text area, clicks on a button and the program counts the frequency of different characters in the string and shows the result on another text area.
E.g. Step 1:- User enter:- aaabbbbbbcccdd
Step 2:- User click the button
Step 3:- a 3
b 6
c 3
d 1
This is what I've done so far....
public partial class Form1 : Form
{
Dictionary<string, int> dic = new Dictionary<string, int>();
string s = "";
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
s = textBox1.Text;
int count = 0;
for (int i = 0; i < s.Length; i++ )
{
textBox2.Text = Convert.ToString(s[i]);
if (dic.Equals(s[i]))
{
count++;
}
else
{
dic.Add(Convert.ToString(s[i]), count++);
}
}
}
}
}
Any ideas or help how can I countinue because till now the program is giving a run time error when there are same charachter!!
Thank You
var lettersAndCounts = s.GroupBy(c=>c).Select(group => new {
Letter= group.Key,
Count = group.Count()
});
Instead of dic.Equals use dic.ContainsKey. However, i would use this little linq query:
Dictionary<string, int> dict = textBox1.Text
.GroupBy(c => c)
.ToDictionary(g => g.Key.ToString(), g => g.Count());
You are attempting to compare the entire dictionary to a string, that doesn't tell you if there is a key in the dictionary that corresponds to the string. As the dictionary never is equal to the string, your code will always think that it should add a new item even if one already exists, and that is the cause of the runtime error.
Use the ContainsKey method to check if the string exists as a key in the dictionary.
Instead of using a variable count, you would want to increase the numbers in the dictionary, and initialise new items with a count of one:
string key = s[i].ToString();
textBox2.Text = key;
if (dic.ContainsKey(key)) {
dic[key]++;
} else {
dic.Add(key, 1);
}
I'm going to suggest a different and somewhat simpler approach for doing this. Assuming you are using English strings, you can create an array with capacity = 26. Then depending on the character you encounter you would increment the appropriate index in the array. For example, if the character is 'a' increment count at index 0, if the character is 'b' increment the count at index 1, etc...
Your implementation will look something like this:
int count[] = new int [26] {0};
for(int i = 0; i < s.length; i++)
{
count[Char.ToLower(s[i]) - int('a')]++;
}
When this finishes you will have the number of 'a's in count[0] and the number of 'z's in count[25].

dynamic programming for minimum cost of breaking the string

A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions 3 and 10, then making the first cut at position 3 incurs a total cost of 20+17=37, while doing position 10 first has a better cost of 20+10=30.
Give a dynamic programming algorithm that, given the locations of m cuts in a string of length n, finds the minimum cost of breaking the string into m + 1 pieces.
This problem is from "Algorithms" chapter6 6.9.
Since there is no answer for this problem, This is what I thought.
Define OPT(i,j,n) as the minimum cost of breaking the string, i for start index, j for end index of String and n for the remaining number of cut I can use.
Here is what I get:
OPT(i,j,n) = min {OPT(i,k,w) + OPT(k+1,j,n-w) + j-i} for i<=k<j and 0<=w<=n
Is it right or not? Please help, thx!
I think your recurrence relation can become more better. Here's what I came up with, define cost(i,j) to be the cost of cutting the string from index i to j. Then,
cost(i,j) = min {length of substring + cost(i,k) + cost(k,j) where i < k < j}
void s_cut()
{
int l,p;
int temp=0;
//ArrayList<Integer> al = new ArrayList<Integer>();
int al[];
Scanner s=new Scanner(System.in);
int table[][];
ArrayList<Integer> values[][];
int low=0,high=0;
int min=0;
l=s.nextInt();
p=s.nextInt();
System.out.println("The values are "+l+" "+p);
table= new int[l+1][l+1];
values= new ArrayList[l+1][l+1];
al= new int[p];
for(int i=0;i<p;i++)
{
al[i]=s.nextInt();
}
for(int i=0;i<=l;i++)
for(int j=0;j<=l;j++)
values[i][j]=new ArrayList<Integer>();
System.out.println();
for(int i=1;i<=l;i++)
table[i][i]=0;
//Arrays.s
Arrays.sort(al);
for(int i=0;i<p;i++)
{
System.out.print(al[i]+ " ");
}
for(int len=2;len<=l;len++)
{
//System.out.println("The length is "+len);
for(int i=1,j=i+len-1;j<=l;i++,j++)
{
high= min_index(al,j-1);
low= max_index(al,i);
System.out.println("Indices are "+low+" "+high);
if(low<=high && low!=-1 && high!=-1)
{
int cost=Integer.MAX_VALUE;;
for(int k=low;k<=high;k++)
{
//if(al[k]!=j)
temp=cost;
cost=Math.min(cost, table[i][al[k]]+table[al[k]+1][j]);
if(temp!=cost)
{
min=k;
//values[i][j].add(al[k]);
//values[i][j].addAll(values[i][al[k]]);
//values[i][j].addAll(values[al[k]+1][j]);
//values[i][j].addAll(values[i][al[k]]);
}
//else
//cost=0;
}
table[i][j]= len+cost;
values[i][j].add(al[min]);
//values[i][j].addAll(values[i][al[min]]);
values[i][j].addAll(values[al[min]+1][j]);
values[i][j].addAll(values[i][al[min]]);
}
else
table[i][j]=0;
System.out.println(" values are "+i+" "+j+" "+table[i][j]);
}
}
System.out.println(" The minimum cost is "+table[1][l]);
//temp=values[1][l];
for(int e: values[1][l])
{
System.out.print(e+"-->");
}
}
The above solution has the complexity of O(n^3).

Remove single character occurrence from String

I want an algorithm to remove all occurrences of a given character from a string in O(n) complexity or lower? (It should be INPLACE editing original string only)
eg.
String="aadecabaaab";
removeCharacter='a'
Output:"decbb"
Enjoy algo:
j = 0
for i in length(a):
if a[i] != symbol:
a[j] = a[i]
j = j + 1
finalize:
length(a) = j
You can't do it in place with a String because it's immutable, but here's an O(n) algorithm to do it in place with a char[]:
char[] chars = "aadecabaaab".toCharArray();
char removeCharacter = 'a';
int next = 0;
for (int cur = 0; cur < chars.length; ++cur) {
if (chars[cur] != removeCharacter) {
chars[next++] = chars[cur];
}
}
// chars[0] through chars[4] will have {d, e, c, b, b} and next will be 5
System.out.println(new String(chars, 0, next));
Strictly speaking, you can't remove anything from a String because the String class is immutable. But you can construct another String that has all characters from the original String except for the "character to remove".
Create a StringBuilder. Loop through all characters in the original String. If the current character is not the character to remove, then append it to the StringBuilder. After the loop ends, convert the StringBuilder to a String.
Yep. In a linear time, iterate over String, check using .charAt() if this is a removeCharacter, don't copy it to new String. If no, copy. That's it.
This probably shouldn't have the "java" tag since in Java, a String is immutable and you can't edit it in place. For a more general case, if you have an array of characters (in any programming language) and you want to modify the array "in place" without creating another array, it's easy enough to do with two indexes. One goes through every character in the array, and the other starts at the beginning and is incremented only when you see a character that isn't removeCharacter. Since I assume this is a homework assignment, I'll leave it at that and let you figure out the details.
import java.util.*;
import java.io.*;
public class removeA{
public static void main(String[] args){
String text = "This is a test string! Wow abcdefg.";
System.out.println(text.replaceAll("a",""));
}
}
Use a hash table to hold the data you want to remove. log N complexity.
std::string toRemove = "ad";
std::map<char, int> table;
size_t maxR = toRemove.size();
for (size_t n = 0; n < maxR; ++n)
{
table[toRemove[n]] = 0;
}
Then parse the whole string and remove when you get a hit (thestring is an array):
size_t counter = 0;
while(thestring[counter] != 0)
{
std::map<char,int>::iterator iter = table.find(thestring[counter]);
if (iter == table.end()) // we found a valid character!
{
++counter;
}
else
{
// move the data - dont increment counter
memcpy(&thestring[counter], &thestring[counter+1], max-counter);
// dont increment counter
}
}
EDIT: I hope this is not a technical test or something like that. =S

Resources