VIM: how to change case of accented characters with 'gU'? - vim

Following sentence contains all accented characters (chars with diacritic) that are used in Czech language.
příliš žluťoučký kůň úpěl ďábelské ódy
Now I convert this line to uppercase using gUU and I get:
PříLIš žLUťOUčKý Kůň úPěL ďáBELSKé óDY
instead of:
PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY
As you can see the characters with accents don't get converted. What do I have to set in my .vimrc to get it working right?

Related

Other than text how to remove numbers , punctuation, white spaces and special characters from text? [duplicate]

This question already has answers here:
Remove all special characters, punctuation and spaces from string
(19 answers)
Closed 2 years ago.
I just scraped text data from a website and that data contains numbers, special characters and punctuation. After splitting the data and I tried to keep plain text but I'm getting spcaes, numbers, special characters. How to remove all those things and keep the text free from above things.
url = 'www.example.com'
html = urllib.request.urlopen(url).read().decode('utf-8')
text = get_text(html)
extracted_data = text.split()
refined_data = []
SYMBOLS = '{}()[].,:;+-*/&|<>=~0123456789'
for i in extracted_data:
if i not in SYMBOLS:
refined_data.append(i)
print("\n", "$" * 50, "HEYAAA we got arround: ", len(refined_data), " of keywords! Here are they: ","$" * 50, "\n")
print(type(refined_data))
output:
1.My
2.system
3.showing
4.error
5.404
6.I
7.don't
8.understand
9.why
10. it
11. showing ,
12.like
13.this?
14.53251
15.$45
extracted_data is the result of string.split()
The string.split() method used as such will split your text along 'any whitespaces'.
The not in operator compares i (the entire string) to a sequence. Your sequence here is just a single string, so it's like a list of the individual characters in that string.
So is 'system' in the sequence SYMBOLS? Asked again: is the string 'system' any of the characters in SYMBOLS? No it is not. Therefore, your if statement is executed and it is appended to your product.
Is '53251' in the list of one characters SYMBOLS? Not it is not. Therefore, it is appended.
And so on.
Such a list comparison is not necessary. You should be using str.strip()

insert n characters before pattern

I have a text file where I want to insert 20 spaces before the string 'LABEL'. I'd like to do this in vim.
I was hoping something like s/LABEL/ {20}LABEL/ would work. It doesn't.
This SO question is close to what I want to do, but I can't put 'LABEL' after the '=repeat()'. Vim regex replace with n characters
%s/LABEL/\=repeat(' ',20)/g works.
%s/LABEL/\=repeat(' ',20)LABEL/g gives me E15: Invalid expression: repeat(' ',20)LABEL
How do I get vim to evaluate =repeat() but not =repeat()LABEL?
After \=, a string is expect. And LABEL isn't a valid string
%s/LABEL/\=repeat(' ',20).'LABEL'/g
BTW thanks to \ze, you don't need to repeat what is searched.
%s/\zeLABEL/\=repeat(' ',20)/g
Note that if you need to align various stuff, you could use printf() instead
%s#label1\|other label#\=printf('%20s', submatch(0))#

How to replace unwanted characters

I have some hotels that contains characters which are not valie for when i want to insert these hotel names as a file name as file naming doesn't allow /, * or ? and want to know what this error means.
text?text
text?text
text**text**
text*text (text)
text *text*
text?
I am trying to use an if else statement so that if a hotel name contains any of these characters, then replace them with -. However I am receiving and error stating a dangling ?. I just want to check if I am using the replace correctly for these characters.
def hotelNameTrim = hotelName.toString()
if (hotelNameTrim.contains("/"))
{
hotelNameTrim.replaceAll("/", "-")
}
else if (hotelNameTrim.contains("*"))
{
hotelNameTrim.replaceAll("*", "-")
}
else if (hotelNameTrim.contains("?"))
{
hotelNameTrim.replaceAll("?", "-")
}
replaceAll accepts a regex as a search pattern. * and ? are special characters in regex and need to be escaped with a back slash. Which itself needs to be escaped in a Java string :)
Try this:
hotelNameTrim.replaceAll("[\\*\\?/]","-")
That will replace all you characters with a dash.

Eggplant/Sensetalk parsing and separating a string with capitalized words

I'm in need of the ability to parse and separate a text string using Sensetalk (the scripting language the Eggplant GUI tester uses). What I'd like to be able to do is provide the code a text string:
Put "MyTextIsHere" into exampleString
And then have spaces inserted before every capital letter save for the first, so the following is then stored in exampleString:
"My Text Is Here"
I basically want to separate the string into the words it contains. After searching the documentation and the web, I'm no closer to finding a solution to this (I agree, it would be far easier in a different language - alas, not my choice).
Thank you in advance to anyone who can provide some insight!
See question at http://www.testplant.com/phpBB2/viewtopic.php?t=2192.
With credit to Pamela at TestPlant forums:
set startingString to "HereAreMyWords"
set myRange to 2 to the number of characters in startingString // The range to iterate over– every character except the first
Put the first character in startingString into endString // The first character isn't included in the repeat loop, so you have to put it in separately
repeat with each character myletter of characters myRange of startingString
if charToNum(myLetter) is between 65 and 90 // if the character's unicode number is between 65-90...
Put space after endString
end if
Put myLetter after endString
end repeat
put endString
or you could do it this way:
Put "MyTextIsHere" into exampleString
repeat with each char of chars 2 to last of exampleString by reference
if it is an uppercase then put space before it
end repeat
put exampleString

What encoding is this and how can I decode it?

I've got an old project file with translations to Portuguese where special characters are broken:
error.text.required=\u00C9 necess\u00E1rio o texto.
error.categoryid.required=\u00C9 necess\u00E1ria a categoria.
error.email.required=\u00C9 necess\u00E1rio o e-mail.
error.email.invalid=O e-mail \u00E9 inv\u00E1lido.
error.fuel.invalid=\u00C9 necess\u00E1rio o tipo de combust\u00EDvel.
error.regdate.invalid=\u00C9 necess\u00E1rio ano de fabrica\u00E7\u00E3o.
error.mileage.invalid=\u00C9 necess\u00E1ria escolher a quilometragem.
error.color.invalid=\u00C9 necess\u00E1ria a cor.
Can you tell me how to decode the file to use the common Portuguese letters?
Thanks
The "\u" is prefix for unicode. You can use the strings "as is", and you'll have diacritics showing in the output. A python code would be something like:
print u"\u00C9 necess\u00E1rio o texto."
which outputs:
É necessário o texto.
Otherwise, you need to convert them in their ASCII equivalents. You can do a simple find/replace. I ended up writing a function like that for converting Romanian diacritics a while ago, but I had dynamic strings coming in...
Smell to me like this is unicode?
\u = prefix unicode character
00E1 = hex code for the 2 byte number of the unicode.
Not sure what the format is - I would ask the sencer, but i would try this approach to decode it.
found it ;)
http://www.fileformat.info/info/unicode/char/20/index.htm
Look at the tables with source code. This can be a C++ source file. This is the way you give unicodde characters in source.

Resources