Comparing not equal length strings in Go

Comparing not equal length strings in Go - string

When I compare the following not equal length strings in Go, the result of comparison is not right. Can someone help?
i := "1206410694"
j := "128000000"
fmt.Println("result is", i >= j, i, j )
The output is:
result is false 1206410694 128000000
The reason is probably because Go does char by char comparison starting with the most significant char. In my case these strings represent numbers so i is larger than j. So just wonder if someone can help with explaining how not equal length strings are compared in go.

The reason is probably because Go does char by char comparison starting with the most significant char.
This is correct.
If they represent numbers, then you should compare as them as numbers. Parse / convert them to int before comparing:
ii, _ := strconv.Atoi(i)
ij, _ := strconv.Atoi(j)
Edit: And yes, #JimB is totally right. If you are not 100% sure that the conversion will succeed, please do not ignore the errors.

Related

Count number of wonderful substrings

I found below problem in one website.
A wonderful string is a string where at most one letter appears an odd number of times.
For example, "ccjjc" and "abab" are wonderful, but "ab" is not.
Given a string word that consists of the first ten lowercase English letters ('a' through 'j'), return the number of wonderful non-empty substrings in word. If the same substring appears multiple times in word, then count each occurrence separately.
A substring is a contiguous sequence of characters in a string.
Example 1 :
Input: word = "aba"
Output: 4
Explanation: The four wonderful substrings are a , b , a(last character) , aba.
I tried to solve it. I implemented a O(n^2) solution (n is input string length). But expected time complexity is O(n). I could not solve it in O(n). I found below solution but could not understood it. Can you please help me to understand below O(n) solution for this problem or come up with an O(n) solution?
My O(N^2) approach - for every substring check whether it has at most one odd count char. This check can be done in O(1) time using an 10 character array.
class Solution {
public:
long long wonderfulSubstrings(string str) {
long long ans=0;
int idx=0; long long xorsum=0;
unordered_map<long long,long long>mp;
mp[xorsum]++;
while(idx<str.length()){
xorsum=xorsum^(1<<(str[idx]-'a'));
// if xor is repeating it means it is having even ouccrences of all elements
// after the previos ouccerence of xor.
if(mp.find(xorsum)!=mp.end())
ans+=mp[xorsum];
mp[xorsum]++;
// if xor will have at most 1 odd character than check by xoring with (a to j)
// check correspondingly in the map
for(int i=0;i<10;i++){
long long temp=xorsum;
temp=temp^(1<<i);
if(mp.find(temp)!=mp.end())
ans+=mp[temp];
}
idx++;
}
return ans;
}
};

There's two main algorithmic tricks in the code, bitmasks and prefix-sums, which can be confusing if you've never seen them before. Let's look at how the problem is solved conceptually first.
For any substring of our string S, we want to count the number of appearances for each of the 10 possible letters, and ask if each number is even or odd.
For example, with a substring s = accjjc, we can summarize it as: odd# a, even# b, odd# c, even# d, even# e, even# f, even# g, even# h, even# i, even# j. This is kind of long, so we can summarize it using a bitmask: for each letter a-j, put a 1 if the count is odd, or 0 if the count is even. This gives us a 10-digit binary string, which is 1010000000 for our example.
You can treat this as a normal integer (or long long, depending on how ints are represented). When we see another character, the count flips whether it was even or odd. On bitmasks, this is the same as flipping a single bit, or an XOR operation. If we add another 'a', we can update the bitmask to start with 'even# a' by XORing it with the number 1000000000.
We want to count the number of substrings where at most one character count is odd. This is the same as counting the number of substrings whose bitmask has at most one 1. There are 11 of these bitmasks: the ten-zero string, and each string with exactly one 1 for each of the ten possible spots. If you interpret these as integers, the last ten strings are the first ten powers of 2: 1<<0, 1<<1, 1<<2, ... 1<<9.
Now, we want to count the bitmasks for all substrings in O(n) time. First, solve a simpler problem: count the bitmasks for just all prefixes, and store these counts in a hashmap. We can do this by keeping a running bitmask from the start, and performing updates by an XOR of the bit corresponding to that letter: xorsum=xorsum^(1<<(str[idx]-'a')). This can clearly be done in a single, O(n) time pass through the string.
How do we get counts of arbitrary substrings? The answer is prefix-sums: the count of letters in any substring can be expressed as a different of two prefix-counts. For example, with s = accjjc, suppose we want the bitmask corresponding to the substring 'jj'. This substring can be written as the difference of two prefixes: 'jj' = 'accjj' - 'acc'.
In the same way, we want to subtract the counts for the two prefix strings. However, we only have the bitmasks telling us whether each letter has an even or odd frequency. In the arithmetic of bitmasks, we treat each position mod 2, so coordinate-wise subtraction becomes XOR.
This means counts(jj) = counts(accjj) - counts(acc) becomes
bitmask(jj) = bitmask(accjj) ^ bitmask(acc).
There's still a problem: the algorithm I've described is still quadratic. If, at every prefix, we iterate over all previous prefix-bitmasks and check if our mask XOR the old mask is one of the 11 goal-bitmasks, we still have a quadratic runtime. Instead, you can use the fact that XOR is its own inverse: if a ^ b = c, then b = a ^ c. So, instead of doing XORs with old prefix masks, you XOR with the 11 goal masks and add the number of times we've seen that mask: ans+=mp[xorsum] counts the substrings ending at our current index whose bitmask is xorsum ^ 0000000000 = xorsum. The loop after that counts substrings whose bitmask is one of the ten goal bitmasks.
Lastly, you just have to add your current prefix-mask to update the counts: mp[xorsum]++.

Convert string S to another string T by performing exactly K operations (append to / delete from the end of the string S)

I am trying to solve a problem. But I am missing some corner case. Please help me. The problem statement is:
You have a string, S , of lowercase English alphabetic letters. You can perform two types of operations on S:
Append a lowercase English alphabetic letter to the end of the string.
Delete the last character in the string. Performing this operation on an empty string results in an empty string.
Given an integer, k, and two strings, s and t , determine whether or not you can convert s to t by performing exactly k of the above operations on s.
If it's possible, print Yes; otherwise, print No.
Examples
Input Output
hackerhappy Yes
hackerrank
9
5 delete operations (h,a,p,p,y) and 4 append operations (r,a,n,k)
aba Yes
aba
7
4 delete operations (delete on empty = empty) and 3 append operations
I tried in this way (C language):
int sl = strlen(s); int tl = strlen(t); int diffi=0;
int i;
for(i=0;s[i]&&t[i]&&s[i]==t[i];i++); //going till matching
diffi=i;
((sl-diffi+tl-diffi<=k)||(sl+tl<=k))?printf("Yes"):printf("No");
Please help me to solve this.
Thank You

You also need the remaining operations to divide in 2, because you need to just add and remove letters to waste the operations.
so maybe:
// c language - strcmp(s,t) returns 0 if s==t.
if(strcmp(s,t))
((sl-diffi+tl-diffi<=k && (k-(sl-diffi+tl-diffi))%2==0)||(sl+tl<=k))?printf("Yes"):printf("No");
else
if(sl+tl<=k||k%2==0) printf("Yes"); else printf("No");

You can do it one more way using binary search.
Take the string of smaller length and take sub-string(pattern) of length/2.
1.Do a binary search(by character) on both of the string if u get a match append length/4 more character to the pattern if it matches add more by length/2^n else append one character to the original(pattern of length/2) and try .
2.If u get a mismatch for pattern of length/2 reduce length of the pattern to length/4 and if u get a match append next character .
Now repeat the steps 1 and 2
If n1+n2 <= k then the answer is Yes
else the answer is no
Example:
s1=Hackerhappy
s2=Hackerrank
pattern=Hacker // length = 10 (s2 is smaller and length of s2=10 length/2 =5)
//Do a binary search of the pattern you will get a match by steps 1 and 2
n1 number of mismatched characters is 5
n2 number of mismatched characters is 4
Now n1+n2<k // its because we will need to do these much operation to make these to equal.
So Yes

This should work for all cases:
int sl = strlen(s); int tl = strlen(t); int diffi=0;
int i,m;
for(i=0;s[i]&&t[i]&&s[i]==t[i];i++); //going till matching
diffi=i;
m = sl+tl-2*diffi;
((k>=m&&(k-m)%2==0)||(sl+tl<=k))?printf("Yes"):printf("No");

Convert an int to hex and then pad it with 0's to get a fixed length String

I'm having some issues on trying to convert an int to hex then, padding it with 0s in order to get a 6 Characters String which represents the hex number.
So far, I tried the following:
intNumber := 12
hexNumber := strconv.FormatInt(intNumber, 16) //not working
And then I found out how to pad it with 0s, using %06d, number/string. It makes all the strings 6 characters long.
Here you can Find a Playground which I set up to make some tests.
How can I achieve this in a efficient way?
For any Clarifications on the question, just leave a comment below.
Thanks In advance.

import "fmt"
hex := fmt.Sprintf("%06x", num)
The x means hexadecimal, the 6 means 6 digits, the 0 means left-pad with zeros and the % starts the whole sequence.

Check if a string starts with a decimal digit?

It looks the following works, is it a good approach?
var thestr = "192.168.0.1"
if (thestr[0]>= '0' && thestr[0] <= '9'){
//...
}

Your solution is completely fine.
But note that strings in Go are stored as a read-only byte slice where the bytes are the UTF-8 encoded byte sequence, and indexing a string indexes its bytes, not its runes (characters). But since a decimal digit ('0'..'9') has exactly one byte, it is ok in this case to test the first byte, but first you should test if len(s) > 0 or s != "".
Here are some other alternatives, try all on the Go Playground:
1) Testing the byte range:
This is your solution, probably the fastest one:
s := "12asdf"
fmt.Println(s[0] >= '0' && s[0] <= '9')
2) Using fmt.Sscanf():
Note: this also accepts if the string starts with a negative number, decide if it is a problem for you or not (e.g. accepts "-12asf").
i := 0
n, err := fmt.Sscanf(s, "%d", &i)
fmt.Println(n > 0, err == nil) // Both n and err can be used to test
3) Using unicode.IsDigit():
fmt.Println(unicode.IsDigit(rune(s[0])))
4) Using regexp:
I would probably never use this as this is by far the slowest, but here it is:
r := regexp.MustCompile(`^\d`)
fmt.Println(r.FindString(s) != "")
Or:
r := regexp.MustCompile(`^\d.*`)
fmt.Println(r.MatchString(s))

Please do not use regexps for that simple task :)
What I would change in this case:
add check for empty string before checking for the first rune
I would rephrase it as "starts with a digit" as the number semantic is too broad. .5e-45 is a number, but probably it is not what you want. 0's semantic is also not simple: https://math.stackexchange.com/questions/238737/why-do-some-people-state-that-zero-is-not-a-number

Since you are comparing by character and no characters are between 1 and 9, I think your solution is ok, but it does not account for the other numbers following.
For example, if thestr was "192.something.invalid" it's no longer an IP.
I'd recommend using a regular expression to check the IP.
something like
\b(?:\d{1,3}\.){3}\d{1,3}\b

How do I find the largest sequence in a string that is repeated at least once?

Trying to solve the following problem:
Given a string of arbitrary length, find the longest substring that occurs more than one time within the string, with no overlaps.
For example, if the input string was ABCABCAB, the correct output would be ABC. You couldn't say ABCAB, because that only occurs twice where the two substrings overlap, which is not allowed.
Is there any way to solve this reasonably quickly for strings containing a few thousand characters?
(And before anyone asks, this is not homework. I'm looking at ways to optimize the rendering of Lindenmayer fractals, because they tend to take excessive amounts of time to draw at high iteration levels with a naive turtle graphics system.)

Here's an example for a string of length 11, which you can generalize
Set chunk length to floor(11/2) = 5
Scan the string in chunks of 5 characters left to looking for repeats. There will be 3 comparisons
Left Right
Offset Offset
0 5
0 6
1 5
If you found a duplicate you're done. Otherwise reduce the chunk length to 4 and repeat until chunk length goes to zero.
Here's some (obviously untested) pseudocode:
String s
int len = floor(s.length/2)
for int i=len; i>0; i--
for j=0; j<=len-(2*i); j++
for k=j+i; k<=len-i; k++
if s.substr(j,j+i) == s.substr(k,k+i)
return s.substr(j,j+i)
return null
There may be an off-by-one error in there, but the approach should be sound (and minimal).

it looks like a suffix tree problem. Create the suffix tree, then find the biggest compressed branch with more than one child (occurs more than once in the original string). The number of letters in that compressed branch should be the size of the biggest subsequence.
i found something similar here: http://www.coderanch.com/t/370396/java/java/Algorithm-wanted-longest-repeating-substring
Looks like it can be done in O(n).

First we need to define the start symbol of our substring and define the length. Iterate all possible start positions then figure out the length doing binary search for the length (if you can find substr with lenght a, you may find with the longer length, function looks monotonous so bin search should be fine). Then find equal substring is N, using KMP or Rabin-Karp any linear algo is fine. Total N*N*log(N). Is that too much complexity?
The code is something like:
for(int i=0;i<input.length();++i)
{
int l = i;
int r = input.length();
while(l <= r)
{
int middle = l + ((r - l) >> 1);
Check if string [i;middle] can be found in initial string. Should be done in O(n); You need to check parts of initial string [0,i-1], [middle+1;length()-1];
if (found)
l = middle + 1;
else
r = middle - 1;
}
}
Make sense?

This type of analysis is often done in genome sequences. have a look at this paper. it has an efficient implemention (c++) for solving repeats: http://www.complex-systems.com/pdf/17-4-4.pdf
might be what you are looking for

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Comparing not equal length strings in Go - string

Related

Count number of wonderful substrings

Convert string S to another string T by performing exactly K operations (append to / delete from the end of the string S)

Convert an int to hex and then pad it with 0's to get a fixed length String

Check if a string starts with a decimal digit?

How do I find the largest sequence in a string that is repeated at least once?

Categories

Resources