I'm trying to split a given string when two quotes appear and they contain at least 3 characters(I also have to split whenever . or , appear). So, something like hello"example"hello,cat should return [hello;example;hello;cat].
I came up with:
re.split("\'(...+)\'|\.|,","hello'example'hello,cat")
This works fine with the quotes, but whenever it split for . or , this happens:
['hello', 'example', 'hello', None, 'cat']
I found out the capture group is the one that causes it (the None in the middle of the list), but it is the only way I know to keep the content.
Please keep in mind that I have to do as few as possible computations because the program shall work with huge files, also I'm not very experienced with Python so sorry if I did something obvious wrong.
Try just:
re.split("\'|\.|,", "hello'example'hello,cat")
It's tricky because the open quote and close quote is the exact same character. I think you'd have to use a negative look behind to exclude any single quote that is preceded by 0, 1 or 2 characters and another single quote. In addition, you'd have to use a positive lookahead. This works in javascript.
re.split("(?<!'(.?|..))'(?=[^']{3,})|\.|\,", "hello'example'hello,cat")
But it doesn't look like python supports variable-length lookbehinds. Also, this won't work if there is a lone single quote (apostrophe).
I had been trying to match multiline string and character literals in VS Code, but there is no support for highlighting across more than one line using a regex. This is a known issue. At the bottom of the issue, it is told to use a semantic highlight provider.
VS Code's Semantic highlight guide gives a set number of tokens to semantically highlight. My main problem is that multiline strings are not detected as tokens in the first place, so they cannot be modified to change to the right color.
I am trying to match a BQN string: A BQN string is a double quote, followed by any number of non-quote characters including newlines, followed by another double quote. Double quotes inside a string are escaped by typing two quotes: "qu""ote" translates to qu"ote.
I'd like to know if there is a way to syntax highlight multiline strings via this method or any other method available to a VS Code extension. Help and examples are highly appreciated.
I have a data file (comma separated) that has a lot of NAs (It was generated by R). I opened the file in vim and tried to replace all the NA values to empty strings.
Here is a sample slimmed down version of a record in the file:
1,1,NA,NA,NA,NATIONAL,NA,1,NANA,1,AMERICANA,1
Once I am done with the search-replace, the intended output should be:
1,1,,,,NATIONAL,,1,NANA,1,AMERICANA,1
In other words, all the NAs should be replaced except the words NATIONAL, NANA and AMERICANA.
I used the following command in vim to do this:
1, $ s/\,NA\,/\,\,/g
But, it doesn't seem to work. Here is the output that I get:
1,1,,NA,,NATIONAL,,1,NANA,1,AMERICANA,1
As you can see, there is one ,NA, that is left out of the replacement process.
Does anyone have a good way to fix it? Thanks.
A trivial solution is to run the same command again and it will take care of the remaining ,NA,. However, it is not a feasible solution because my actual data file has 100s of columns and 500K+ rows each with a variable number of NAs.
, doesn't have a special meaning so you don't have to escape it:
:1,$s/,NA,/,,/g
Which doesn't solve your problem.
You can use % as a shorthand for 1,$:
:%s/,NA,/,,/g
Which doesn't solve your problem either.
The best way to match all those NA words to the exclusion of other words containing NA would be to use word boundaries:
:%s/,\<NA\>,/,,/g
Which still doesn't solve your problem.
Which makes those commas, that you used to restrict the match to NA and that are causing the error, useless:
:%s/\<NA\>//g
See :help :range and :help \<.
Use % instead of 1,$ (% means "the buffer" aka the whole file).
You don't need \,. , works fine.
Vim finds discrete, non-overlapping matches. so in ,NA,NA,NA, it only finds the first ,NA, and third ,NA, as the middle one doesn't have its own separate surrounding ,. We can modify the match to not include certain characters of our regex with \zs (start) and \ze (end). These modify our regex to find matches that are surrounded by other characters, but our matches don't actually include them, so we can match all the NA in ,NA,NA,NA,.
TL;DR: %s/,\zsNA\ze,//g
I can search the following without problems:
_GET
$variable
However, sublime fails to search $_ (p.e. $_GET.) I have tried to escape it somehow:
$\_GET
\$_GET
$__GET
I'm on Ubuntu 14.04LTS
Turn off the regular expressions search. It is the button on the far left of the search field (in this picture currently selected):
With regular expressions turned off:
Although I'm not sure if this would fit your exact problem since you tried escaping using \$_, this answer may still help for posterity.
Did you also make sure "whole word" search is turned off? That's the 3rd button from the left (next to the Aa)
With whole word turned on:
Failing with the attempted escaped \$_:
And it succeeding with _GET:
Note that whole word search of $_ would succeed if there was a whole $_ phrase, surrounded by whitespace. For example with whole word search on:
I am a sentence with the keyword $_ which will be matched.
would work, whereas:
I am a sentence with the keyword $_GET, which will never match. $_POST, $_REQUEST, and $_SERVER won't work either.
would break the whole word search.
str = "fa, (captured)[asd] asf, 31"
for word in str:gmatch("\(%a+\)") do
print(word)
end
Hi! I want to capture a word between parentheses.
My Code should print "captured" string.
lua: /home/casey/Desktop/test.lua:3: invalid escape sequence near '\('
And i got this syntax error.
Of course, I can just find position of parentheses and use string.sub function
But I prefer simple code.
Also, brackets gave me a similar error.
The escape character in Lua patterns is %, not \. So use this:
word=str:match("%((%a+)%)")
If you only need one match, there is no need for a gmatch loop.
To capture the string in square brackets, use a similar pattern:
word=str:match("%[(%a+)%]")
If the captured string is not entirely composed of letters, use .- instead of %a+.
lhf's answer likely gives you what you need, but I'd like to mention one more option that I feel is underused and may work for you as well. One issue with using %((%a+)%) is that it doesn't work for nested parentheses: if you apply it to something like "(text(more)text)", you'll get "more" even though you may expect "text(more)text". Note that you can't fix it by asking to match to the first closing parenthesis (%(([^%)]+)%)) as it will give you "text(more".
However, you can use %bxy pattern item, which balances x and y occurrences and will return (text(more)text) in this case (you'd need to use something like (%b()) to capture it). Again, this may be overkill for your case, but useful to keep in mind and may help someone else who comes across this problem.