dart efficient string processing techniques? - string

I strings in the format of name:key:dataLength:data and these strings can often be chained together. for example "aNum:n:4:9879aBool:b:1:taString:s:2:Hi" this would map to an object something like:
{
aNum: 9879,
aBool: true,
aString: "Hi"
}
I have a method for parsing a string in this format but I'm not sure whether it's use of substring is the most efficient way of pprocessing the string, is there a more efficient way of processing strings in this fashion (repeatedly chopping off the front section):
Map<string, dynamic> fromString(String s){
Map<String, dynamic> _internal = new Map();
int start = 0;
while(start < s.length){
int end;
List<String> parts = new List<String>(); //0 is name, 1 is key, 2 is data length, 3 is data
for(var i = 0; i < 4; i++){
end = i < 3 ? s.indexOf(':') : num.parse(parts[2]);
parts[i] = s.substring(start, end);
start = i < 3 ? end + 1 : end;
}
var tranType = _tranTypesByKey[parts[1]]; //this is just a map to an object which has a function that can convert the data section of the string into an object
_internal[parts[0]] = tranType._fromStr(parts[3]);
}
return _internal;
}

I would try s.split(':') and process the resulting list.
If you do a lot of such operations you should consider creating benchmarks tests, try different techniques and compare them.
If you would still need this line
s = i < 3 ? s.substring(idx + 1) : s.substring(idx);
I would avoid creating a new substring in each iteration but instead just keep track of the next position.

You have to decide how important performance is relative to readability and maintainability of the code.
That said, you should not be cutting off the head of the string repeatedly. That is guaranteed to be inefficient - it'll take time that is quadratic in the number of records in your string, just creating those tail strings.
For parsing each field, you can avoid doing substrings on the length and type fields. For the length field, you can build the number yourself:
int index = ...;
// index points to first digit of length.
int length = 0;
int charCode = source.codeUnitAt(index++);
while (charCode != CHAR_COLON) {
length = 10 * length + charCode - 0x30;
charCode = source.codeUnitAt(index++);
}
// index points to the first character of content.
Since lengths are usually small integers (less than 2<<31), this is likely to be more efficient than creating a substring and calling int.parse.
The type field is a single ASCII character, so you could use codeUnitAt to get its ASCII value instead of creating a single-character string (and then your content interpretation lookup will need to switch on character code instead of character string).
For parsing content, you could pass the source string, start index and length instead of creating a substring. Then the boolean parser can also just read the code unit instead of the singleton character string, the string parser can just make the substring, and the number parser will likely have to make a substring too and call double.parse.
It would be convenient if Dart had a double.parseSubstring(source, [int from = 0, int to]) that could parse a substring as a double without creating the substring.

Related

Is there a faster method to find dictionary keys with wildcards in the middle?

Let's say I have a dictionary with strings of 0s, 1s, and '*' as wildcards for my key value.
For example, my dictionary is structured as such:
{'010*10000':'foo', '100*1*000':'bar'......}
Each dictionary value has a fixed string length, however, there are wildcards within the string represented as '*' characters. Thus, values of '010110000' or '010010000' both return 'foo'.
The problem lies in the length of my dictionary. The dictionary I am working with has over 500,000+ entries. Therefore, when I try to iterate over each key in the dict to find if a key exists, then it takes far too long with O(n) complexity.
Ideally, I would like to find a way to just check if a value such as '010110000' is in the dictionary, similar to the .get() function for regular python dictionaries without wildcards.
I've already tried iterating over my dictionary using fnmatch like the following Wildcard in dictionary key:
for k in my_dict.keys():
if fnmatch.fnmatch(string_of_1s_and_0s, k):
print(my_dict[k])
break
##Do some operation here if we have found the matching key pair...and then break.
However, it's just too slow with O(n) complexity. Is there any way to implement get() but with wildcards?
dicts are hash code based; the hash code, if implemented correctly, will differ wildly for a difference of just one character. There is no way to make a dict do what you want, but what you're doing is probably best done with something other than a dict in the first place. Have you considered a relational database, where the LIKE operator could do something like this? It might still have to scan a large part of the DB, but ideally it could use anchors at one end or the other to at least narrow the search to matching prefixes/suffixes.
Rotate the original pattern left (by taking characters from the start and putting them at the end) while keeping track of the rotate count; like this:
'010*10000' -> '*10000010', rotate_count = 3
'100*1*000' -> '*1*000100', rotate_count = 3
Then split it into a "complex part" and a "simple part", and determine the length of the simple part, like this:
'010*10000' -> '*10000010', rotate_count = 3
complex = '*`, simple = `10000010', simple_length = 8
'100*1*000' -> '*1*000100', rotate_count = 3
complex = '*1*`, simple = `000100', simple_length = 6
If the fixed length of the strings is 16, then there will be 16 possible values of rotate_count, and for each one there will be 16 - rotate_count possible values of simple_length. This can be described as a nested loop:
for(rotate_count = 0; rotate_count < 16; rotate_count++) {
for(simple_length = 0; simple_length = 16 - rotate_count; simple_length++) {
}
}
You can associate an "array of entries" with this, like:
entry_number = 0;
for(rotate_count = 0; rotate_count < 16; rotate_count++) {
for(simple_length = 0; simple_length = 16 - rotate_count; simple_length++) {
entry_number++;
}
}
Then you can use the entry number to find a hash table, like:
entry_number = 0;
for(rotate_count = 0; rotate_count < 16; rotate_count++) {
for(simple_length = 0; simple_length = 16 - rotate_count; simple_length++) {
hash_table = array_of_hash_tables[entry_number];
entry_number++;
}
}
You can also rotate the string you're looking for by the rotate_count and extract simple_length characters from that, convert those characters into a hash, and use it to find a list of entries from the hash table, like:
entry_number = 0;
for(rotate_count = 0; rotate_count < 16; rotate_count++) {
rotated_string = rotate_string(original_string, rotate_count);
for(simple_length = 0; simple_length = 16 - rotate_count; simple_length++) {
hash_table = array_of_hash_tables[entry_number];
if(hash_table != NULL) {
hash = get_simple_hash(rotated_string, simple_length);
list = hash_table[hash];
// Use "list" and "original string" to do the hard stuff here...
}
entry_number++;
}
}
This will quickly eliminate lots of entries (where the start and end don't match) and give you a list of "potential matches" where you'd have to check the part containing wild cards against the original string to determine if there is/isn't an actual match.
Note that if the characters are "ones and zeros" this can be improved by converting "strings containing binary digits" into integers.

Time complexity of a string compression algorithm

I tried to do an exercice from cracking the code interview book about compressing a string :
Implement a method to perform basic string compression using the counts of
repeated characters. For example, the string aabcccccaaa would become
a2blc5a3. If the "compressed" string would not become smaller than the original
string, your method should return the original string.
The first proposition given by the author is the following :
public static String compressBad(String str) {
int size = countCompression(str);
if (size >= str.length()) {
return str;
}
String mystr = "";
char last = str.charAt(0);
int count = 1;
for (int i = 1; i < str.length(); i++) {
if (str.charAt(i) == last) {
count++;
} else {
mystr += last + "" + count;
last = str.charAt(i);
count = 1;
}
}
return mystr + last + count;
}
About the time complexity of thie algorithm, she said that :
The runtime is 0(p + k^2), where p is the size of the original string and k is the number of character sequences. For example, if the string is aabccdeeaa, then there are six character sequences. It's slow because string concatenation operates in 0(n^2) time
I have two main questions :
1) Why the time complexity is O(p+k^2), it should be only O(p) (where p is the size of the string) no? Because we do only p iterations not p + k^2.
2) Why the time complexity of a string concatenation is 0(n^2)???
I thought that if we have a string3 = string1 + string2 then we have a complexity of size(string1) + size(string2), because we have to create a copy of the two strings before adding them to a new string (create a string is O(1)). So, we will have an addition not a multiplication, no? It isn't the same thing for an array (if we used a char array for example) ?
Can you clarify these points please? I didn't understand how we calculate the complexity...
String concatenation is O(n) but in this case it's being concatenated K times.
Every time you find a sequence you have to copy the entire string + what you found. for example if you had four total sequences in the original string
the cost to get the final string will be:
(k-3)+ //first sequence
(k-3)+(k-2) + //copying previous string and adding new sequence
((k-3)+(+k-2)+k(-1) + //copying previous string and adding new sequence
((k-3)+(k-2)+(k-1)+k) //copying previous string and adding new sequence
Thus complexity is O(p + K^2)

Add comma sequentially to string in C#

I have a string.
string str = "TTFTTFFTTTTF";
How can I break this string and add character ","?
result should be- TTF,TTF,FTT,TTF
You could use String.Join after you've grouped by 3-chars:
var groups = str.Select((c, ix) => new { Char = c, Index = ix })
.GroupBy(x => x.Index / 3)
.Select(g => String.Concat(g.Select(x => x.Char)));
string result = string.Join(",", groups);
Since you're new to programming. That's a LINQ query so you need to add using System.Linq to the top of your code file.
The Select extension method creates an anonymous type containing the char and the index of each char.
GroupBy groups them by the result of index / 3 which is an integer division that truncates decimal places. That's why you create groups of three.
String.Concat creates a string from the 3 characters.
String.Join concatenates them and inserts a comma delimiter between each.
Here is a really simple solution using StringBuilder
var stringBuilder = new StringBuilder();
for (int i = 0; i < str.Length; i += 3)
{
stringBuilder.AppendFormat("{0},", str.Substring(i, 3));
}
stringBuilder.Length -= 1;
str = stringBuilder.ToString();
I'm not sure if the following is better.
stringBuilder.Append(str.Substring(i, 3)).Append(',');
I would suggest to avoid LINQ in this case as it will perform a lot more operations and this is a fairly simple task.
You can use insert
Insert places one string into another. This forms a new string in your C# program. We use the string Insert method to place one string in the middle of another one—or at any other position.
Tip 1:
We can insert one string at any index into another. IndexOf can return a suitable index.
Tip 2:
Insert can be used to concatenate strings. But this is less efficient—concat, as with + is faster.
for(int i=3;i<=str.Length - 1;i+=4)
{
str=str.Insert(i,",");
}

Char Array Returning Integers

I've been working through this exercise, and my output is not what I expect.
(Check substrings) You can check whether a string is a substring of another string
by using the indexOf method in the String class. Write your own method for
this function. Write a program that prompts the user to enter two strings, and
checks whether the first string is a substring of the second.
** My code compromises with the problem's specifications in two ways: it can only display matching substrings to 3 letters, and it cannot work on string literals with less than 4 letters. I mistakenly began writing the program without using the suggested method, indexOf. My program's objective (although it shouldn't entirely deviate from the assignment's objective) is to design a program that determines whether two strings share at least three consecutive letters.
The program's primary error is that it generates numbers instead of char characters. I've run through several, unsuccessful ideas to discover what the logical error is. I first tried to idenfity whether the char characters (which, from my understanding, are underwritten in unicode) were converted to integers, considering that the outputted numbers are also three letters long. Without consulting a reference, I know this isn't true. A comparison between java and javac outputted permutation of 312, and a comparison between abab and ababbab ouputted combinations of 219. j should be > b. My next thought was that the ouputs were indexes of the arrays I used. Once again, this isn't true. A comparison between java and javac would ouput 0, if my reasoning were true.
public class Substring {
public static char [] array;
public static char [] array2;
public static void main (String[]args){
java.util.Scanner input = new java.util.Scanner (System.in);
System.out.println("Enter your two strings here, the longer one preceding the shorter one");
String container1 = input.next();
String container2 = input.next();
char [] placeholder = container1.toCharArray();
char [] placeholder2 = container2.toCharArray();
array = placeholder;
array2 = placeholder2;
for (int i = 0; i < placeholder2.length; i++){
for (int j = 0; j < placeholder.length; j ++){
if (array[j] == array2[i]) matcher(j,i);
}
}
}
public static void matcher(int higher, int lower){
if ((higher < array.length - 2) && (lower < array2.length - 2))
if (( array[higher+1] == array2[lower+1]) && (array[higher+2] == array2[lower+2]))
System.out.println(array[higher] + array[higher+1] + array[higher+2] );
}
}
The + operator promotes shorts, chars, and bytes operands to ints, so
array[higher] + array[higher+1] + array[higher+2]
has type int, not type char which means that
System.out.println(...)
binds to
System.out.println(int)
which displays its argument as a decimal number, instead of binding to
System.out.println(char)
which outputs the given character using the PrintStream's encoding.

Remove single character occurrence from String

I want an algorithm to remove all occurrences of a given character from a string in O(n) complexity or lower? (It should be INPLACE editing original string only)
eg.
String="aadecabaaab";
removeCharacter='a'
Output:"decbb"
Enjoy algo:
j = 0
for i in length(a):
if a[i] != symbol:
a[j] = a[i]
j = j + 1
finalize:
length(a) = j
You can't do it in place with a String because it's immutable, but here's an O(n) algorithm to do it in place with a char[]:
char[] chars = "aadecabaaab".toCharArray();
char removeCharacter = 'a';
int next = 0;
for (int cur = 0; cur < chars.length; ++cur) {
if (chars[cur] != removeCharacter) {
chars[next++] = chars[cur];
}
}
// chars[0] through chars[4] will have {d, e, c, b, b} and next will be 5
System.out.println(new String(chars, 0, next));
Strictly speaking, you can't remove anything from a String because the String class is immutable. But you can construct another String that has all characters from the original String except for the "character to remove".
Create a StringBuilder. Loop through all characters in the original String. If the current character is not the character to remove, then append it to the StringBuilder. After the loop ends, convert the StringBuilder to a String.
Yep. In a linear time, iterate over String, check using .charAt() if this is a removeCharacter, don't copy it to new String. If no, copy. That's it.
This probably shouldn't have the "java" tag since in Java, a String is immutable and you can't edit it in place. For a more general case, if you have an array of characters (in any programming language) and you want to modify the array "in place" without creating another array, it's easy enough to do with two indexes. One goes through every character in the array, and the other starts at the beginning and is incremented only when you see a character that isn't removeCharacter. Since I assume this is a homework assignment, I'll leave it at that and let you figure out the details.
import java.util.*;
import java.io.*;
public class removeA{
public static void main(String[] args){
String text = "This is a test string! Wow abcdefg.";
System.out.println(text.replaceAll("a",""));
}
}
Use a hash table to hold the data you want to remove. log N complexity.
std::string toRemove = "ad";
std::map<char, int> table;
size_t maxR = toRemove.size();
for (size_t n = 0; n < maxR; ++n)
{
table[toRemove[n]] = 0;
}
Then parse the whole string and remove when you get a hit (thestring is an array):
size_t counter = 0;
while(thestring[counter] != 0)
{
std::map<char,int>::iterator iter = table.find(thestring[counter]);
if (iter == table.end()) // we found a valid character!
{
++counter;
}
else
{
// move the data - dont increment counter
memcpy(&thestring[counter], &thestring[counter+1], max-counter);
// dont increment counter
}
}
EDIT: I hope this is not a technical test or something like that. =S

Resources