How to 'delete between and including quotes, but not leading/trailing whitespace'? - vim

'bla bla bla', 'bla bla bla'
---------------^------------ (cursor position)
To delete the second 'bla bla bla' I use
da'
but this also deletes the leading space. Is there a way to not include the leading space in the deletion?
(I'm trying to create a macro to replace quoted strings with a function call, ie replace eg
'bla bla bla', 'woot'
with
yada('bla_bla_bla'), yada('woot') )

in macro you can also use command, like this:
s/'.\{-}'/yada(&)/g
This will only apply on '...', the rest (space, comma etc) won't be touched.

You can use vi'i'<operator> to operate on the quotes and their content. This would make your macro look something like that:
vi'i'cyada(<C-r>")

Related

replace sub-string with last special character, being (3rd part) of comma separated string

I have a string with comma separated values, like:
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0-,,,
As you can see, the 3rd comma separated value has sometimes special character, like the dash (-), in the end. I want to used sed, or preferably perl command to replace this string (with the -i option, so as to replace at existing file), with same string at the same place (i.e. 3rd comma separated value) but without the special character (like the dash (-)) at the end of the string. So, result at above example string should be:
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0,,,
Since such multiple lines like the above are inside a file, I am using while loop at shell/bash script to loop and manipulate all lines of the file. And I have assigned the above string values to variables, so as to replace them using perl. So, my while loop is:
while read mystr
do
myNEWstr=$(echo $mystr | sed s/[_.-]$// | sed s/[__]$// | sed s/[_.-]$//)
perl -pi -e "s/\b$mystr\b/$myNEWstr/g" myFinalFile.txt
done < myInputFile.txt
where:
$mystr is the "SOME-STRING_A_-BLAHBLAH_1-4MP0-"
$myNEWstr result is the "SOME-STRING_A_-BLAHBLAH_1-4MP0"
Note that the myInputFile.txt is a file that contains the 3rd comma separated values of the myFinalFile.txt, so that those EXACT string values ($mystr) will be checked for special characters in the end, like underscore, dash, dot, double-underscore, and if they exist to be removed and form the new string ($myNEWstr), then finally that new string ($myNEWstr) to be replaced at the myFinalFile.txt, so as to have the resulting strings like the example final string shown above, i.e. with the 3rd comma separated sub-string value WITHOUT the special character in the end (which is dash (-) at above example).
Thank you.
You could use the following regex:
s/^([^,]*,[^,]*,[^,]*)-,/$1,/
This defined csv fields as series of characters other than a comma (empty fields are allowed). We are looking for a dash at the very end of the third csv field. The regex captures everything until there, and then replaces it while omitting the dash.
$ cat t.txt
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0-,,,
]$ perl -p -e 's/^([^,]*,[^,]*,[^,]*)-,/$1,/' t.txt
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0,,,
]$

how to find special character like \t \n \u and single quotation mark (") in presto

I'm using presto. I have a table that contain address information. It has varchar format.
How do I find addresses that contain special characters like:
\t (tab)
\n (newline)
\u
single quotation mark (')
You can use LIKE with literal containing newline. Convenient way it to use unicode escapes for this (newline \n is U+000A in Unicode):
col LIKE U&'%\000A%'
U&'...' creates string literal, just like '...'.
The only difference is that U&'...' supports \hhhh escapes for Unicode.
Example:
presto:default> SELECT 'abc
-> def' LIKE U&'%\000A%';
_col0
-------
true
(1 row)
Tested on Presto 324.

Grep whole paragraphs of a text containing a specific keyword

My goal is to extract the paragraphs of a text that contain a specific keyword. Not just the lines that contain the keyword, but the whole paragraph. The rule imposed on my text files is that every paragraph starts with a certain pattern (e.g. Pa0) which is used throughout the text only in the start of the paragraph. Each paragraph ends with a new line character.
For example, imagine I have the following text:
Pa0
This is the first paragraph bla bla bla
This is another line in the same paragraph bla bla
This is a third line bla bla
Pa0
This is the second paragraph bla bla bla
Second line bla bla My keyword is here!
bla bla bla
bla
Pa0
Hey, third paragraph bla bla bla!
bla bla
Pa0
keyword keyword
keyword
Another line! bla
My goal is to extract these paragraphs that contain the word "keyword". For example:
Pa0
This is the second paragraph bla bla bla
Second line bla bla My keyword is here!
bla bla bla
bla
Pa0
keyword keyword
keyword
Another line! bla
I can use e.g. grep for the keyword and -A, -B or -C option to get a constant number of lines before and/or after the line where the keyword is located but this does not seem enough since the beginning and end of the text block depends on the delimiters "Pa0" and "\n".
Any suggestion for grep or another tool (e.g. awk, sed, perl) would be helpful.
It is simple with awk:
awk '/keyword/' RS="\n\n" ORS="\n\n" input.txt
Explanation:
Usually awk operates on a per line basis, because the default value of the record separator RS is \n (a single new line). By changing the RS to two new lines in sequence (an empty line) we can easily operate on a paragraph basis.
/keyword/ is a condition, a regex. Since there is no action after the condition awk will simply print the unchanged record (the paragraph) if it contains keyword.
Setting the output record separator ORS to \n\n will separate the paragraphs of output with an empty line, just like in the input.
if text.txt contains the text you want, then:
$ sed -e '/./{H;$!d;}' -e 'x;/keyword/!d;' text.txt
Pa0
This is the second paragraph bla bla bla
Second line bla bla My keyword is here!
bla bla bla
bla
Pa0
keyword keyword
keyword
Another line! bla
hope this will help
sed -n '/Pa0/,/^$/p' filename
cat filename | sed -n '/Pa0/,/^$/p'
-n, suppress automatic printing of pattern space
-p, Print the current pattern space
/Pa0/, paragraph starting with Pa0 pattern
/^$/, paragraph ending with a blank line
^, start of line
$, end of line
Reference: http://www.cyberciti.biz/faq/sed-display-text/

Trying to replace a pattern with another one

this is my first question on this website.(glad i found out about this community)
I am trying to replace a specific pattern in a file(multiple lines) that looks somehow like this:
Bla bla bla bla |SMTH AWESOME INSIDE >>> LOL| bla bla bla | let's do it again >>> AWESOME |
Into a format that looks like this
Bla bla bla bla ( LOL | SMTH AWESOME INSIDE ) bla bla bla ( AWESOME | let's do it again )
I tried doing this by using a code that parses the line word by word and if it finds out the "|" character starts creating a string that contains the first word,then, after it finds the >>> character it starts creating the second string till it finds the "|" last character, but it didn't work.
I also tried afterwards using AWK(but since i am new to linux i failed as well.
awk -F 'BEGIN { FS=OFS="|" } { sub(/.*<<</,"", $2); }1' $1 }'
and then parse the output with sed(removing the ) and ( characters from both strings. But it didn't work.
Thank you for reading.
It looks like this is just a simple substitution within each line so all you need is sed:
$ sed 's/| *\([^|]*\) >>> \([^|]*\) *|/( \2 | \1 )/g' file
Bla bla bla bla ( LOL | SMTH AWESOME INSIDE ) bla bla bla ( AWESOME | let's do it again )
You can do the same in GNU awk with gensub() or other awks with match() and substr().
With extended regexp in sed:
sed -r 's/\|([^|]+)[[:space:]]*>>>[[:space:]]*([^|]+)\|/( \2 | \1 )/g' File
Logic:
We look for a pattern which starts with | followed by a sequence of non-| characters followed by >>> followed by a sequence of non-| characters again. See the groupings done with ( and ). Then we substitute these patterns according to our need. ( \2 | \1 ) is the replacement pattern where \1 and \2 are the first and second groupings respectively.
With basic regexp in sed:
sed 's/|\([^|]*\)[[:space:]]*>>>[[:space:]]*\([^|]*\)|/( \2 | \1 )/g' File
Perl's regular expressions have a "non-greedy" matching feature that awk's do not:
perl -pe '
s/ \| # the first delimiter
(.*?) # capture up to ...
>>> # the middle delimiter
(.*?) # capture up to ...
\| # the last delimiter
/($2 | $1)/gx
' file
Bla bla bla bla ( LOL | SMTH AWESOME INSIDE ) bla bla bla ( AWESOME | let's do it again )
Let's try with awk:
awk 'NR%2{ printf("%s", $0) } NR%2==0{ printf("( %s %s",$NF,RS); gsub(/>>>.*$/,")"); printf("%s",$0) }' RS='|' file
Bla bla bla bla ( LOL | SMTH AWESOME INSIDE ) bla bla bla ( AWESOME | let's do it again )
The RS defines | as record separator. So when the input record number (NR) isn't module of 2 (NR%2 return 1) then print that record itself. If the NR is module of 2 (NR%2==0 means if record is module of 2), then print a single open parentheses followed by printing last field from it and print record separator (printf("( %s %s",$NF,RS)), then replace >>>.*$ with close parentheses and print the rest of record (gsub(/>>>.*$/,")"); printf("%s",$0))

Printing string in Perl

Is there an easy way, using a subroutine maybe, to print a string in Perl without escaping every special character?
This is what I want to do:
print DELIMITER <I don't care what is here> DELIMITER
So obviously it will great if I can put a string as a delimiter instead of special characters.
perldoc perlop, under "Quote and Quote-like Operators", contains everything you need.
While we usually think of quotes as literal values, in Perl they function as operators, providing various kinds of interpolating and pattern matching
capabilities. Perl provides customary quote characters for these behaviors, but also provides a way for you to choose your quote character for any of
them. In the following table, a "{}" represents any pair of delimiters you choose.
Customary Generic Meaning Interpolates
'' q{} Literal no
"" qq{} Literal yes
`` qx{} Command yes*
qw{} Word list no
// m{} Pattern match yes*
qr{} Pattern yes*
s{}{} Substitution yes*
tr{}{} Transliteration no (but see below)
<<EOF here-doc yes*
* unless the delimiter is ''.
$str = q(this is a "string");
print $str;
if you mean quotes and apostrophes with 'special characters'
You can use the __DATA__ directive which will treat all of the following lines as a file that can be accessed from the DATA handle:
while (<DATA>) {
print # or do something else with the lines
}
__DATA__
#!/usr/bin/perl -w
use Some::Module;
....
or you can use a heredoc:
my $string = <<'END'; #single quotes prevent any interpolation
#!/usr/bin/perl -b
use Some::Module;
....
END
The printing is not doing special things to the escapes, double quoted strings are doing it. You may want to try single quoted strings:
print 'this is \n', "\n";
In a single quoted string the only characters that must be escaped are single quotes and a backslash that occurs immediately before the end of the string (i.e. 'foo\\').
It is important to note that interpolation does not work with single quoted strings, so
print 'foo is $foo', "\n";
Will not print the contents of $foo.
You can pretty much use any character you want with q or qq. For example:
#!/usr/bin/perl
use utf8;
use strict; use warnings;
print q∞This is a test∞;
print qq☼\nThis is another test\n☼;
print q»But, what is the point?»;
print qq\nYou are just making life hard on yourself!\n;
print qq¿That last one is tricky\n¿;
You cannot use qq DELIMITER foo DELIMITER. However, you could use heredocs for a similar effect:
print <<DELIMITER
...
DELIMETER
;
or
print <<'DELIMETER'
...
DELIMETER
;
but your source code would be really ugly.
If you want to print a string literally and you have Perl 5.10 or later then
say 'This is a string with "quotes"' ;
will print the string with a newline.. The importaning thing is to use single quotes ' ' rather than double ones " "

Resources