How to extract filename from string in vimscript? - vim

Say we are given this string in a vimscript:
"/home/Linus Torvalds/.vim/bundle/vim-autoformat/formatters/tidy -q --show-errors 0 --show-warnings 0 --indent auto --indent-spaces 2 --vertical-space yes --tidy-mark no --wrap 68".
How do we extract the filename part? In this case that would be:
"/home/Linus Torvalds/.vim/bundle/formatters/tidy".

If you can guarantee there are no dashes (-) in the path itself, I would do it like this:
matchstr(input_string,'^.\{-}\ze -')
Explanation: From the beginning of the string (^) match any character non-greedily (.\{-}) until the first occurrence of a space followed by a dash (\ze -).
Or you could just match until the first dash and then trim any trailing whitespace with a substitute() command, which would be less concise, but might be more readable.

Related

Partial replace with sed command

We have a filewith some utf-16 decimal characters and we would like to replace them in the following manner
Test Line in a file \u343- ? some random words \u1233? 300 \u241? \u208?\cell
The required out put is
Test Line in a file \u343- ? some random words UTF16-1233| 300 UTF16-241| UTF16-208|\cell
The requirement is to change \u[0-9]+? to UTF16-[0-9]+|
Replace the initial \u to UTF16- and the ending ? with a pipe |.
Please note if there is any non digit character between \u and ? it should not be considered
Using sed to modify the file in place, you can:
Match \\u([0-9]+)\?:
Match a literal \u, match and capture one or more digits, match a literal ?.
Replace UTF16-\1:
Replace with the string UTF16- followed by the captured group.
$ sed -i -E 's/\\u([0-9]+)\?/UTF16-\1|/g' file
$ cat file
Test Line in a file \u343- ? some random words UTF16-1233| 300 UTF16-241| UTF16-208|\cell

replace sub-string with last special character, being (3rd part) of comma separated string

I have a string with comma separated values, like:
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0-,,,
As you can see, the 3rd comma separated value has sometimes special character, like the dash (-), in the end. I want to used sed, or preferably perl command to replace this string (with the -i option, so as to replace at existing file), with same string at the same place (i.e. 3rd comma separated value) but without the special character (like the dash (-)) at the end of the string. So, result at above example string should be:
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0,,,
Since such multiple lines like the above are inside a file, I am using while loop at shell/bash script to loop and manipulate all lines of the file. And I have assigned the above string values to variables, so as to replace them using perl. So, my while loop is:
while read mystr
do
myNEWstr=$(echo $mystr | sed s/[_.-]$// | sed s/[__]$// | sed s/[_.-]$//)
perl -pi -e "s/\b$mystr\b/$myNEWstr/g" myFinalFile.txt
done < myInputFile.txt
where:
$mystr is the "SOME-STRING_A_-BLAHBLAH_1-4MP0-"
$myNEWstr result is the "SOME-STRING_A_-BLAHBLAH_1-4MP0"
Note that the myInputFile.txt is a file that contains the 3rd comma separated values of the myFinalFile.txt, so that those EXACT string values ($mystr) will be checked for special characters in the end, like underscore, dash, dot, double-underscore, and if they exist to be removed and form the new string ($myNEWstr), then finally that new string ($myNEWstr) to be replaced at the myFinalFile.txt, so as to have the resulting strings like the example final string shown above, i.e. with the 3rd comma separated sub-string value WITHOUT the special character in the end (which is dash (-) at above example).
Thank you.
You could use the following regex:
s/^([^,]*,[^,]*,[^,]*)-,/$1,/
This defined csv fields as series of characters other than a comma (empty fields are allowed). We are looking for a dash at the very end of the third csv field. The regex captures everything until there, and then replaces it while omitting the dash.
$ cat t.txt
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0-,,,
]$ perl -p -e 's/^([^,]*,[^,]*,[^,]*)-,/$1,/' t.txt
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0,,,
]$

Add blank spaces after substring while keeping the columns

I have a data file like this:
randomthingsbefore $DATAROOT/randompathwithoutanypattern randomthingsafter
randomthingsbefore $DATAROOT/randompathwithoutanypattern randomthingsafter $DATAROOT/randompathwithoutanypattern randomthingsafter
randomthingsbefore $DATAROOT/randompathwithoutanypattern randomthingsafter
(...)
I want to delete the substring $DATAROOT from each path and add blank spaces after the path to keep the columns where randomthingsafter started. Notice that there could be 2 or more paths with the $DATAROOT substring in the same line. This way, my desired output would look like this:
randomthingsbefore /randompathwithoutanypattern randomthingsafter
randomthingsbefore /randompathwithoutanypattern randomthingsafter /randompathwithoutanypattern randomthingsafter
randomthingsbefore /randompathwithoutanypattern randomthingsafter
(...)
I've tried:
VAR1=*pathtofile*
VAR2=$(\grep -oP '\$DATAROOT\K[^ ]*' $VAR1)
arr=$(echo $VAR2 | tr " " "\n")
for x in $arr
do
y="${x} "
sed -i "s:$x:$y:" $VAR1
done
sed -i 's/$DATAROOT\///g' $VAR1
but it does not seem to work. Thank you for your help!
I believe the easiest is just to use sed to replace your script in a single line:
sed 's/$DATAROOT\([^[:blank:]]*\)/\1 /g' /path/to/file
Note, that are 9 spaces after \1 which is the length of the string $DATAROOT. Here we make use of what is known as back-reference:
Editing Commands in sed
[2addr]s/BRE/replacement/flags:
Substitute the replacement string for instances of the BRE in the pattern space. Any character other than <backslash> or <newline> can be used instead of a <slash> to delimit the BRE and the replacement. Within the BRE and the replacement, the BRE delimiter itself can be used as a literal character if it is preceded by a <backslash>.
The replacement string shall be scanned from beginning to end. An <ampersand> ( & ) appearing in the replacement shall be replaced by the string matching the BRE. The special meaning of & in this context can be suppressed by preceding it by a <backslash>. The characters \n, where n is a digit, shall be replaced by the text matched by the corresponding back-reference expression. If the corresponding back-reference expression does not match, then the characters \n shall be replaced by the empty string. The special meaning of \n where n is a digit in this context, can be suppressed by preceding it by a <backslash>. For each other <backslash> encountered, the following character shall lose its special meaning (if any).
source: POSIX SED
9.3.6 BREs Matching Multiple Characters
The back-reference expression \n shall match the same (possibly empty) string of characters as was matched by a subexpression enclosed between \( and \) preceding the \n. The character n shall be a digit from 1 through 9, specifying the nth subexpression (the one that begins with the nth \( from the beginning of the pattern and ends with the corresponding paired \) ). The expression is invalid if less than n subexpressions precede the \n. The string matched by a contained subexpression shall be within the string matched by the containing subexpression. If the containing subexpression does not match, or if there is no match for the contained subexpression within the string matched by the containing subexpression, then back-reference expressions corresponding to the contained subexpression shall not match. When a subexpression matches more than one string, a back-reference expression corresponding to the subexpression shall refer to the last matched string. For example, the expression ^\(.*\)\1$ matches strings consisting of two adjacent appearances of the same substring, and the expression \(a\)*\1 fails to match a, the expression \(a\(b\)*\)*\2 fails to match abab, and the expression ^\(ab*\)*\1$ matches ababbabb, but fails to match ababbab.
source: POSIX Basic Regular Expressions

perl: print remaining string only if there is no character before the matched value.

The following prints the entire content of the line after "B. "
perl -ne'print if /B[.] (.*)/s' $string > file
How can I match/print the line only if there is no other character before the "B. "? In other words, if there is a character before the "B. " ie. "TAB." skip the line / do not print.
The correct "B." is always on a new line, the only correct line to match appears as follows:
B. some text here
A regex with a leading carat indicates that the expression should match only if it is the first item on the line. The pattern /^B[.] (.*)/s should get you the result you're looking for.
Put ^ in front of the B. It means match the word starts with B. So your regex should be /^B\. (.*)/. Then no need you s flag in your pattern match.

Working with sed linux command

In my shellscript code I saw that there is line that is handling Telephone number using sed command.
sed "s~<Telephone type[ ]*=[ ]*\"fax\"[ ]*><Number>none[ ]*</Number></Telephone>~~g" input.xml > output.xml
I am not understanding what the regular expression actually does.
<Telephone type[ ]*=[ ]*\"fax\"[ ]*><Number>none[ ]*</Number></Telephone>
I am doing revere engineering to get this working.
My xml structure like below.
<ContactMethod>
<InternetEmailAddress>donald.francis#lexisnexis.com</InternetEmailAddress>
<Telephone type = "work">
<Number>215-639-9000 x3281</Number>
</Telephone>
<Telephone type = "home">
<Number>484-231-1141</Number>
</Telephone>
<Telephone type = "fax">
<Number>N/A</Number>
</Telephone>
<Telephone type = "work">
<Number>215-639-9000 x3281</Number>
</Telephone>
<Telephone type = "home">
<Number>484-231-1141</Number>
</Telephone>
<Telephone type = "fax">
<Number>none</Number>
</Telephone>
<Telephone type1 = "fax12234">
<Number>484-231-1141sadsadasdasdaasd</Number>
</Telephone>
</ContactMethod>
That regex recognises <Telephone type = "fax"> entries where the number is given as none, and deletes them.
Breakdown:
s sed command for "substitution".
~ pattern separator. You can choose any character for this. sed recoginizes it because it comes right after the s.
<Telephone type This matches the literal text "<Telephone type".
[ ]* matches zero or more spaces.
= matches a literal "="
[ ]* matches zero or more spaces.
\"fax\" matches literal text. The quotes are escaped because the whole pattern appears inside quotes, but the shell removes the quote characters (\) before sed sees them.
[ ]* matches zero or more spaces.
><Number>none matches literal text.
[ ]* matches zero or more spaces.
</Number></Telephone> matches the literal text.
~~ the pattern separators end the search pattern, and surround an empty replace pattern.
g is a flag that means the substitution will be performed multiple times on each line.
The only thing that confuses me is that this pattern won't match anything that has line breaks in it, so I presume your input.xml isn't actually formatted like you have in your example data?

Resources