I have a paragraph like this:
"Nothing is worth more than the truth.
I like to say hello to him.
Give me peach or give me liberty.
Please say hello to strangers.
I'ts ok to say Hello to strangers."
I want to result:
"Nothing is worth more than the truth.
Give me peach or give me liberty."
if a line uses the word "hello" then remove that line and take only the line without that word.
I find some information in reference:
enter image description here
so I think ít as follow: regexp "^[^hello]" $line
but it doesn't work
There are several problems with your attempt of:
regexp "^[^hello]" $line
A good practice with Tcl regular expressions is to put your regex inside curly braces instead of double quotes. Square brackets inside double quotes will be evaluated by Tcl as a command.
^ means the beginning of the line in regular expression.
Characters inside square brackets in a regular expression are considered a "character-class". [^hello] does not mean the opposite of matching "hello". Instead, it matches a single character that is not h, e, l, or o.
Do you care about case? If not, then add -nocase.
A Tcl expression, which you can use in an if statement to check that a line does not include "hello" (or "Hello") is simply this:
![regexp -nocase {hello} $line]
Related
I’m quit new to Regex but almost finished with my text mining script. Only one thing fails: I’m trying to remove the apostrophes between a word if they exist. I’m using re.sub for this.
For instance:
‘Apple’ needs to be Apple
‘apple’ needs to be apple
‘[apple]’ needs to be [apple]
‘(apple)’ needs to be (apple)
However: Apple’s needs to stay Apple’s because there is only one apostrophe.
How do I select both apostrophes when there is a word in between so I can delete them with re.sub? In every try I remove the entire string! Hopefully someone can help.
My code is as follows:
str_o='\'Apple\''
str_o_a = re.sub(r"\'(.*?)\'","", str_o)
I have a simpler idea: split by whitespace, trim leading and trailing apostrophes, join with whitespace. Avoids having to write a regular expression and handles sentences such as "She's 'her' mother's daughter".
text = "She's 'her' mother's daughter"
text = ' '.join([word.strip("'") for word in text.split()])
print(text)
# She's her mother's daughter
The purpose of the parentheses in your regular expression was probably to capture the string you want to keep. The idiom looks like
str_o_a = re.sub(r"'([^']*)'", r"\1", str_o)
You want a raw string around the replacement, too, in order to preserve the backslash in the argument (otherwise you would be replacing with the literal string "\x01").
Notice also the preference for using a negated character class over a non-greedy "match anything" wildcard.
I did use the manual but I am unable to get all the options together to understand what the above code is actually doing.
awk -v v='"' 'BEGIN{FS=OFS=v}{gsub(",","",$2);print }' \
${SOURCE_LOCATION}/TEMP1_$file_name>${SOURCE_FILE_LOCATION}/TEMP2_$file_name
When do we have to use the curly brackets in a code after the '$' and when not to. Please explain. Any help is really appreciated.
This command would remove all the commas in the second field. The field separator being the quote character " (as specified by FS).
For example, the following string:
something "string, with, commas" something "else, here, and more"
would be transformed to:
something "string with commas" something "else, here, and more"
The significance of {} in variable names has been well explained by #Joni.
The input is read from the file ${SOURCE_LOCATION}/TEMP1_$file_name and output is redirected to ${SOURCE_LOCATION}/TEMP2_$file_name.
You must use the curly brackets syntax when a variable name is followed by something that's not part of the variable name but could be confused with it. For example, compare
hello="Hello"
echo $hello_world
with
hello="Hello"
echo ${hello}_world
The first one outputs an empty line (or the value of the shell variable hello_world, if it exists), and the second one outputs Hello_world.
In your case they are not necessary because a slash can never be a part of the variable name. Some people prefer to use the brackets to make it clear where the variable begins and where it ends even when they are not required.
I have a pattern where there are double-quotes between numbers in a CSV file.
I can search for the pattern by [0-9]\"[0-9], but how do I retain value while removing the double quote. CSV format is like this:
"1234"5678","Text1","Text2"
"987654321","Text3","text4"
"7812891"3","Text5","Text6"
As you may notice there are double quotes between some numbers which I want to remove.
I have tried the following way, which is incorrect:
:%s/[0-9]\"[0-9]/[0-9][0-9]/g
Is it possible to execute a command at every search pattern, maybe go one character forward and delete it. How can "lx" be embedded in search and replace.
You need to capture groups. Try:
:%s/\(\d\)"\(\d\)/\1\2/g
[A digit can also be denoted by \d.]
I know that this question has been answered already, but here's another approach:
:%s/\d\zs"\ze\d
Explanation:
%s Substitute for the whole buffer
\d look up for a digit
\zs set the start of match here
" look up for a double-quote
\ze set the end of match here
\d look up for a digit
That makes the substitute command to match only the double-quote surrounded by digits.
Omitting the replacement string just deletes the match.
You need boundaries to use in regular expression.
Try this:
:%s/\([0-9]\)"\([0-9]\)/\1\2/g
A bit naive solution:
%s/^"/BEGINNING OF LINE QUOTE MARK/g
%s/\",\"/quote comma quote/g
%s/\"$/quota end of line/g
%s/\"//g
%s/quota end of line/"/g
%s/quote comma quote/","/g
%s/BEGINNING OF LINE QUOTE MARK/"/g
A macro can be created quite easy out of it and invoked as many times as needed.
Had some spam issues on my server and, after finding out and removing some Perl and PHP scripts I'm down to checking what they really do, although I'm a senior PHP programmer I have little experience with Perl, can anyone give me a hand with the script here:
http://pastebin.com/MKiN8ifp
(It was one long line of code, script was called list.pl)
The start of the script is:
$??s:;s:s;;$?::s;(.*); ]="&\%[=.*.,-))'-,-#-*.).<.'.+-<-~-#,~-.-,.+,~-{-,.<'`.{'`'<-<--):)++,+#,-.{).+,,~+{+,,<)..})<.{.)-,.+.,.)-#):)++,+#,-.{).+,,~+{+,,<)..})<*{.}'`'<-<--):)++,+#,-.{).+:,+,+,',~+*+~+~+{+<+,)..})<'`'<.{'`'<'<-}.<)'+'.:*}.*.'-|-<.+):)~*{)~)|)++,+#,-.{).+:,+,+,',~+*+~+~+{+<+,)..})
It continues with precious few non-punctuation characters until the very end:
0-9\;\\_rs}&a-h;;s;(.*);$_;see;
Replace the s;(.*);$_;see; with print to get this. Replace s;(.*);$_;see; again with print in the first half of the payload to get this, which is the decryption code. The second half of the payload is the code to decrypt, but I can't go any further with it, because as you see, the decryption code is looking for a key in an envvar or a cookie (so that only the script's creator can control it or decode it, presumably), and I don't have that key. This is actually reasonably cleverly done.
For those interested in the nitty gritty... The first part, when de-tangled looks like this:
$? ? s/;s/s;;$?/ :
s/(.*)/...lots of punctuation.../;
The $? at the beginning of the line is the pre-defined variable containing the child error, which no doubt serves only as obfuscation. It will be undefined, as there can be no child error at this point.
The questionmark following it is the start of a ternary operator
CONDITION ? IF_TRUE : IF_FALSE
Which is also added simply to obfuscate. The expression returned for true is a substitution regex, where the / slash delimiter has been replaced with colon s:pattern:replacement:. Above, I have put back slashes. The other expression, which is the one that will be executed is also a substitution regex, albeit an incredibly long one. The delimiter is semi-colon.
This substitution replaces .* in $_ - the default input and pattern-searching space - with a rather large amount of punctuation characters, which represents the bulk of the code. Since .* matches any string, even the empty string, it will simply get inserted into $_, and is for all intents and purposes identical to simply assigning the string to $_, which is what I did:
$_ = q;]="&\%[=.*.,-))'-,-# .......;;
The following lines are a transliteration and another substitution. (I inserted comments to point out the delimiters)
y; -"[%-.:<-#]-`{-}#~\$\\;{\$()*.0-9\;\\_rs}&a-h;;
#^ ^ ^ ^
#1 2 3
(1,2,3 are delimiters, the semi-colon between 2 and 3 is escaped)
The basic gist of it is that various characters and ranges -" (space to double quote), and something that looks like character classes (with ranges) [%-.:<-#], but isn't, get transliterated into more legible characters e.g. curly braces, dollar sign, parentheses,0-9, etc.
s;(.*);$_;see;
The next substitution is where the magic happens. It is also a substitution with obfuscated delimiters, but with three modifers: see. s does nothing in this case, as it only allows the wildcard character . to match newline. ee means to evaluate the expression twice, however.
In order to see what I was evaluating, I performed the transliteration and printed the result. I suspect that I somewhere along the line got some characters corrupted, because there were subtle errors, but here's the short (cleaned up) version:
s;(.*);73756220656e6372797074696f6e5f6 .....;; # very long line of alphanumerics
s;(..);chr(hex($1));eg;
s;(.*);$_;see;
s;(.*);704b652318371910023c761a3618265 .....;; # another long line
s;(..);chr(hex($1));eg;
&e_echr(\$_);
s;(.*);$_;see;
The long regexes are once again the data containers, and insert data into $_ to be evaluated as code.
The s/(..)/chr(hex($1))/eg; is starting to look rather legible. It is basically reading two characters at the time from $_ and converting it from hex to corresponding character.
The next to last line &e_echr(\$_); stumped me for a while, but it is a subroutine that is defined somewhere in this evaluated code, as hobbs so aptly was able to decode. The dollar sign is prefixed by backslash, meaning it is a reference to $_: I.e. that the subroutine can change the global variable.
After quite a few evaluations, $_ is run through this subroutine, after which whatever is contained in $_ is evaluated a last time. Presumably this time executing the code. As hobbs said, a key is required, which is taken from the environment %ENV of the machine where the script runs. Which we do not have.
Ask the B::Deparse module to make it (a little more) readable.
I have a file containing string like this one :
print $hash_xml->{'div'}{'div'}{'div'}[1]...
I want to replace {'div'}{'div'}{'div'}[1] by something else.
So I tried
%s/{'div'}{'div'}{'div'}[1]/by something else/gc
The strings were not found. I though I had to escape the {,},[ and ]
Still string not found.
So I tried to search a single { and it found them.
Then I tried to search {'div'}{'div'}{'div'} and it found it again.
Then {'div'}{'div'}{'div'}[1 was still found.
To find {'div'}{'div'}{'div'}[1]
I had to use %s/{'div'}{'div'}{'div'}[1\]
Why ?
vim 7.3 on Linux
The [] are used in regular expressions to wrap a range of acceptable characters.
When both are supplied unescaped, vim is treating the search string as a regex.
So when you leave it out, or escape the final character, vim cannot interpret a single bracket in a regex context, so does a literal search (basically the best it can do given the search string).
Personally, I would escape the opening and closing square brace to ensure that the meaning is clear.
That's because the [ and ] characters are used to build the search pattern.
See :h pattern and use the help file pattern.txt to try the following experiment:
Searching for the "[9-0]" pattern (without quotes) using /[0-9] will match every digit from 0 to 9 individually (see :h \[)
Now, if you try /\[0-9] or /[0-9\] you will match the whole pattern: a zero, an hyphen and a nine inside square brackets. That's because when you escape one of [ or ] the operator [*] ceases to exist.
Using your search pattern, /{'div'}{'div'}{'div'}[1\] and /{'div'}{'div'}{'div'}\[1] should match the same pattern which is the one you want, while /{'div'}{'div'}{'div'}[1] matches the string {'div'}{'div'}{'div'}1.
In order to avoid being caught by these special characters in regular expressions, you can try using the very magic flag.
E.g.:
:%s/\V{'div'}[1]/replacement/
Notice the \V flag at the beginning of the line.
Because the square brackets mean that vim thinks you're looking for any of the characters inside. This is known as a 'character class'. By escaping either of the square brackets it lets vim know that you're looking for the literal square string ending with '[1]'.
Ideally you should write your expression as:
%s/{'div'}{'div'}{'div'}\[1\]/replacement string/
to ensure that the meaning is completely clear.