Rule for d{motion} for different motions - vim

Given:
for (int i = 0; i < 10; i++){
_ <---Cursor position
3w leads to
for (int i = 0; i < 10; i++){
_ <---Cursor position
and d3w leads to
fori = 0; i < 10; i++){
_ <---Cursor position
i.e., even though motion 3w takes cursor upto i, i itself is not deleted.
On the other hand, given:
for (int i = 0; i < 10; i++){
_ <---Cursor position
% leads to
for (int i = 0; i < 10; i++){
_ <---Cursor position
and d% leads to
for{
_ <---Cursor position
i.e., motion % takes cursor upto ) and ) itself is deleted.
So, why is there two different effects of d{motion}? Is there any single general rule of which both of these are consistent special cases?

Yes, there's a logic to that. In Vim, some motions such as w are "exclusive", while other motions such as % are "inclusive". This will determine whether the action will affect the last character of the motion or not.
You can actually override the "exclusive" or "inclusive" status of a motion by using the v operator (note that v is being used as an operator here, not starting Visual mode as it does when used as a Normal mode command!) So dv3w (or d3vw) will delete up to the beginning of third word "inclusive" of the character it lands on, while dv% will delete up to the next matching bracket "exclusive".
In a way, Visual mode is somewhat similar, since a Visual selection is "inclusive" by default, so v3wd would behave similarly to dv3w. (Though this can be overridden by the 'selection' option.)
See:
:help w
:help %
:help exclusive (same as :help inclusive)
:help o_v
:help 'selection'

Related

Optimum solution for splitting a string into three palindromes with earliest cuts

I was asked this question in an interview:
Given a string (1<=|s|<=10^5), check if it is possible to partition it into three palindromes. If there are multiple answers possible, output the one where the cuts are made the earliest. If no answer is possible, print "Impossible".
**Input:**
radarnoonlevel
aabab
abcdefg
**Output:**
radar noon level
a a bab (Notice how a, aba, b is also an answer, but we will output the one with the earliest cuts)
Impossible
I was able to give a brute force solution, running two loops and checking palindrome property for every 3 substrings ( 0-i, i-j, j-end). This was obviously not optimal, but I have not been able to find a better solution since then.
I need a way of checking that if I know the palindrome property of a string, then how removing a character from the start or adding one at the end can give me the property of the new string without having to do the check for the whole string again. I am thinking of using three maps where each character key is mapped to number of occurences but that too doesn't lead me down anything.
Still O(n^2) solution, but you can store the result of palindrome substrings in a table and use that to get to the answer.
vector<string> threePalindromicSubstrings(string word) {
int n = word.size();
vector<vector<bool>> dp (n,vector<bool>(n,false));
for(int i = 0 ; i < n ; ++i)
dp[i][i] = 1;
for(int l = 2 ; l <= n ; ++l){
for(int i = 0 ; i < n - l +1 ; ++i){
int j = i + l - 1;
if(l == 2)
dp[i][j] = (word[i] == word[j]);
else
dp[i][j] = (word[i] == word[j]) && (dp[i+1][j-1]);
}
}
vector<string> ans;
for(int i = 0 ; i < n - 2 ; ++i){
if(dp[0][i]) {
for(int j = i+1 ; j < n - 1 ; ++j){
if(dp[i+1][j] && dp[j+1][n-1]){
ans.push_back(word.substr(0,i + 1));
ans.push_back(word.substr(i+1,j-i));
ans.push_back(word.substr(j+1,n-j));
return ans;
}
}
}
}
if(ans.empty())
ans.push_back("Impossible");
return ans;
}

Leetcode--3 find the longest substring without repeating character

The target is simple--find the longest substring without repeating characters,here is the code:
class Solution {
public:
int lengthOfLongestSubstring(string s) {
int ans = 0;
int dic[256];
memset(dic, -1, sizeof(dic));
int len = s.size();
int idx = -1;
for (int i = 0;i < len;i++) {
char c = s[i];
if (dic[c] > idx)
idx = dic[c];
ans = max(ans, i - idx);
dic[c] = i;
}
return ans;
}
};
From its concise expression,I think this is a high-performance method,and we can get that its Time Complexity is just O(n).But I'm confused about this method,though I came up with some examples to understand,can anyone give some tips or idea to me?
What it is doing is recording the position where each character was last seen.
As you step through, it takes each new encountered character and the length of non-repeat goes back at least as far as that last-seen, but for future indices can't go back further, as we have now seen a duplicate.
So we are maintaining in idx, the start index of the latest-seen highest-starting duplicate, which is the candidate for the start of the longest non-duplicating sequence.
I'm certain that the ans = max() code code be optimised slightly, as after encountering a new duplicate, you have to go forward at least ans chars from the start of that duplicate before ans can be improved again. You still need to do the rest of the work maintaining dic and idx, but you could avoid that particular test for ans for a few iterations. You would have to do a lot of unrolling to benefit, though.

Sublime Text Automatically Adding Forgotten Braces

Sometimes, I want to change the body in a for-loop from one line to multiple lines. In Sublime Text the process goes like this for me:
for(int i = 0; i < 10; i++)
System.out.println("Hello World");
And then I try to insert the curly braces by typing in the open brace (which automatically types the closing brace)
for(int i = 0; i < 10; i++){}
System.out.println("Hello World");
I see that for some other people, the curly braces are inserted correctly. Namely, the closing brace will be at the end of the body.
for(int i = 0; i < 10; i++){
System.out.println("Hello World");
}
No changes were made in their settings and it has always been like that for them. I transitioned from Sublime Text 2 to 3, and I never had that behavior in either version. Is there a fix for this?

Asymmetric Levenshtein distance

Given two bit strings, x and y, with x longer than y, I'd like to compute a kind of asymmetric variant of the Levensthein distance between them. Starting with x, I'd like to know the minimum number of deletions and substitutions it takes to turn x into y.
Can I just use the usual Levensthein distance for this, or do I need I need to modify the algorithm somehow? In other words, with the usual set of edits of deletion, substitution, and addition, is it ever beneficial to delete more than the difference in lengths between the two strings and then add some bits back? I suspect the answer is no, but I'm not sure. If I'm wrong, and I do need to modify the definition of Levenshtein distance to disallow deletions, how do I do so?
Finally, I would expect intuitively that I'd get the same distance if I started with y (the shorter string) and only allowed additions and substitutions. Is this right? I've got a sense for what these answers are, I just can't prove them.
If i understand you correctly, I think the answer is yes, the Levenshtein edit distance could be different than an algorithm that only allows deletions and substitutions to the larger string. Because of this, you would need to modify, or create a different algorithm to get your limited version.
Consider the two strings "ABCD" and "ACDEF". The Levenshtein distance is 3 (ABCD->ACD->ACDE->ACDEF). If we start with the longer string, and limit ourselves to deletions and substitutions we must use 4 edits (1 deletion and 3 substitutions. The reason is that strings where deletions are applied to the smaller string to efficiently get to the larger string can't be achieved when starting with the longer string, because it does not have the complimentary insertion operation (since you're disallowing that).
Your last paragraph is true. If the path from shorter to longer uses only insertions and substitutions, then any allowed path can simply be reversed from the longer to the shorter. Substitutions are the same regardless of direction, but the inserts when going from small to large become deletions when reversed.
I haven't tested this thoroughly, but this modification shows the direction I would take, and appears to work with the values I've tested with it. It's written in c#, and follows the psuedo code in the wikipedia entry for Levenshtein distance. There are obvious optimizations that can be made, but I refrained from doing that so it was more obvious what changes I've made from the standard algorithm. An important observation is that (using your constraints) if the strings are the same length, then substitution is the only operation allowed.
static int LevenshteinDistance(string s, string t) {
int i, j;
int m = s.Length;
int n = t.Length;
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t;
// note that d has (m+1)*(n+1) values
var d = new int[m + 1, n + 1];
// set each element to zero
// c# creates array already initialized to zero
// source prefixes can be transformed into empty string by
// dropping all characters
for (i = 0; i <= m; i++) d[i, 0] = i;
// target prefixes can be reached from empty source prefix
// by inserting every character
for (j = 0; j <= n; j++) d[0, j] = j;
for (j = 1; j <= n; j++) {
for (i = 1; i <= m; i++) {
if (s[i - 1] == t[j - 1])
d[i, j] = d[i - 1, j - 1]; // no operation required
else {
int del = d[i - 1, j] + 1; // a deletion
int ins = d[i, j - 1] + 1; // an insertion
int sub = d[i - 1, j - 1] + 1; // a substitution
// the next two lines are the modification I've made
//int insDel = (i < j) ? ins : del;
//d[i, j] = (i == j) ? sub : Math.Min(insDel, sub);
// the following 8 lines are a clearer version of the above 2 lines
if (i == j) {
d[i, j] = sub;
} else {
int insDel;
if (i < j) insDel = ins; else insDel = del;
// assign the smaller of insDel or sub
d[i, j] = Math.Min(insDel, sub);
}
}
}
}
return d[m, n];
}

Word-level edit distance of a sentence

Is there an algorithm that lets you find the word-level edit distance between 2 sentences?
For eg., "A Big Fat Dog" and "The Big House with the Fat Dog" have 1 substitute, 3 insertions
In general, this is called the sequence alignment problem. Actually it does not matter what entities you align - bits, characters, words, or DNA bases - as long as the algorithm works for one type of items it will work for everything else. What matters is whether you want global or local alignment.
Global alignment, which attempt to align every residue in every sequence, is most useful when the sequences are similar and of roughly equal size. A general global alignment technique is the Needleman-Wunsch algorithm algorithm, which is based on dynamic programming. When people talk about Levinstain distance they usually mean global alignment. The algorithm is so straightforward, that several people discovered it independently, and sometimes you may come across Wagner-Fischer algorithm which is essentially the same thing, but is mentioned more often in the context of edit distance between two strings of characters.
Local alignment is more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. The Smith-Waterman algorithm is a general local alignment method also based on dynamic programming. It is quite rarely used in natural language processing, and more often - in bioinformatics.
You can use the same algorithms that are used for finding edit distance in strings to find edit distances in sentences. You can think of a sentence as a string drawn from an alphabet where each character is a word in the English language (assuming that spaces are used to mark where one "character" starts and the next ends). Any standard algorithm for computing edit distance, such as the standard dynamic programming approach for computing Levenshtein distance, can be adapted to solve this problem.
check out the AlignedSent function in python from the nltk package. It aligns sentences at the word level.
https://www.nltk.org/api/nltk.align.html
Here is a sample implementation of the #templatetypedef's idea in ActionScript (it worked great for me), which calculates the normalized Levenshtein distance (or in other words gives a value in the range [0..1])
private function nlevenshtein(s1:String, s2:String):Number {
var tokens1:Array = s1.split(" ");
var tokens2:Array = s2.split(" ");
const len1:uint = tokens1.length, len2:uint = tokens2.length;
var d:Vector.<Vector.<uint> >=new Vector.<Vector.<uint> >(len1+1);
for(i=0; i<=len1; ++i)
d[i] = new Vector.<uint>(len2+1);
d[0][0]=0;
var i:int;
var j:int;
for(i=1; i<=len1; ++i) d[i][0]=i;
for(i=1; i<=len2; ++i) d[0][i]=i;
for(i = 1; i <= len1; ++i)
for(j = 1; j <= len2; ++j)
d[i][j] = Math.min( Math.min(d[i - 1][j] + 1,d[i][j - 1] + 1),
d[i - 1][j - 1] + (tokens1[i - 1] == tokens2[j - 1] ? 0 : 1) );
var nlevenshteinDist:Number = (d[len1][len2]) / (Math.max(len1, len2));
return nlevenshteinDist;
}
I hope this will help!
The implementation in D is generalized over any range, and thus array. So by splitting your sentences into arrays of strings they can be run through the algorithm and an edit number will be provided.
https://dlang.org/library/std/algorithm/comparison/levenshtein_distance.html
Here is the Java implementation of edit distance algorithm for sentences using dynamic programming approach.
public class EditDistance {
public int editDistanceDP(String sentence1, String sentence2) {
String[] s1 = sentence1.split(" ");
String[] s2 = sentence2.split(" ");
int[][] solution = new int[s1.length + 1][s2.length + 1];
for (int i = 0; i <= s2.length; i++) {
solution[0][i] = i;
}
for (int i = 0; i <= s1.length; i++) {
solution[i][0] = i;
}
int m = s1.length;
int n = s2.length;
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (s1[i - 1].equals(s2[j - 1]))
solution[i][j] = solution[i - 1][j - 1];
else
solution[i][j] = 1
+ Math.min(solution[i][j - 1], Math.min(solution[i - 1][j], solution[i - 1][j - 1]));
}
}
return solution[s1.length][s2.length];
}
public static void main(String[] args) {
String sentence1 = "first second third";
String sentence2 = "second";
EditDistance ed = new EditDistance();
System.out.println("Edit Distance: " + ed.editDistanceDP(sentence1, sentence2));
}
}

Resources