I am trying to make a program that converts the Arabic diacritics and letters into the Latin script. The letters work well in the program, but the diacritics can not be converted as I get an error every time I run the program.
At the beginning, I put the diacritics alone as keys but that did not work with me. please, see the last key, it contains َ ,which is a diacritic, but do not work properly as the letters:
def convert(lit):
ArEn = {'ا':'A', 'ل':'L', "و": "W", "َ":"a"}
end_word=[]
for i in range(len(lit)):
end_word.append(ArEn[lit[i]])
jon = ""
print(jon.join(end_word))
convert("الوَ")
However, I tried to fix the problem by using letters attached with diacritics as keys, but the program resulted in the same error:
the dictionary:
ArEn = {'ا':'A', 'ل':'L', "وَ":"Wa"}
the error:
Traceback (most recent call last):
File "C:\Users\Abdulaziz\Desktop\converter AR to EN SC.py", line 10, in <module>
convert("الوَ")
File "C:\Users\Abdulaziz\Desktop\converter AR to EN SC.py", line 5, in convert
end_word.append(ArEn[lit[i]])
KeyError: 'و'
The chances are rather there is a bug in the programing-code editor you are using for coding Python than on Pyhton itself.
Since you are using Python-3.x, the diacritics from the running progam point of view are just a single character, like any other, and there should be no issues at all.
From the cod-editor point of view, there are issues such as whether to advance one character when displaying certain special unicode characters or not, and maybe the " character itself can be show out of space - when one tries to manually correct the position of the ", one could place it out of order, leaving the special character actually outside the quoted string -
The fact you could solve the issue by re-editing the file suggests that is indeed what happened.
One way to avoid this is to put certain special characters - specially ones that have different displaying rules, is to escape then with the "\uxxxx" unicode codepoint unicode sequence. This will avoid yourself or other persons having issues when editing your file again in the future, since even i yu get it working now, the editor may show then incorrectly when they are opened, and by trying to fix it one might break the syntax again.
You can use a table on the web or Python3's interactive prompt to get the unicode codepoint of each character, ensuring the code part of the program is displayed in a deterministic way in any editor - (if you add the diacritical char as a comment on the same line, it will actually enhance the readability of your code - enormously if it is ever supposed to be edited by non Arabic speakers)
So, your above declaration, I used this snippet to extract the codepoints:
>>> ArEn = {'ا':'A', 'ل':'L', "و": "W", "َ":"a"}
>>> [print (hex(ord(yy)), yy ) for yy in ArEn.keys()]
0x648 و
0x644 ل
0x64e َ
0x627 ا
Which allows me to declare the dictionary like this:
ArEn = {
"\u0648": "W", # و
"\u0644": "L", # L
"\u064e": "a", # ۮ
"\u0627": "A", # ا
}
(And yes, I had trouble with displaying the characters on my terminal like I said you probably had on your editor while getting these - the fatha ("\u064e" - "a") character is tricky ! :-) )
Alternatively for using the codepoints in your code, is to use Python's unicode data module to discover and them use the actual character names - this can enhance readability further, and maybe by exploring unicodedata you can find out you don't even have to create this dictionary manually, but use that module instead -
In [16]: [print("\\u{:04x} - '{}' - {}".format(ord(yy), unicodedata.name(yy), yy) ) for yy in ArEn.keys()]
\u0648 - 'ARABIC LETTER WAW' - و
\u0644 - 'ARABIC LETTER LAM' - ل
\u064e - 'ARABIC FATHA' - َ
\u0627 - 'ARABIC LETTER ALEF' - ا
And from these full text names, you can get back to the character with the unicodedata.lookup function:
>>> unicodedata.lookup("ARABIC LETTER LAM")
'ل'
notes:
1) This requires Python3 - for Python2 one might try to prefix each string with u"" - but one dealign with these characters is far better off using Python 3, since unicode support is one of the big deals with it.
2) This also requires a terminal with a nice support for unicode characters using "utf-8" encoding - I am on a Linux system with the "konsole" terminal. On Windows, the idle Python prompt might work, but not the cmd Python prompt.
You might need proper indentation in python:
def convert(lit):
ArEn = {'ا':'A', 'ل':'L', "و":"W", "َ":"a", "ُ":"w", "":""}
end_word=[]
for i in range(len(lit)):
end_word.append(ArEn[lit[i]])
jon = ""
print(jon.join(end_word))
convert("اُلوَ")
Update: I just noticed, after years, that the letters and diacritics are put together in the first try. When I separated them, the program worked.
I just solved the problem!
I am not really sure if it is a mistake in python or something else, but as far as I know python does not support Arabic very well. Or maybe I made a problem in the program above.
I kept writing the same program and suddenly it worked very well.
I even added different diacritics and they worked properly.
def convert(lit):
ArEn = {'ا':'A', 'ل':'L', "و":"W", "َ":"a", "ُ":"w", "":""}
end_word=[]
for i in range(len(lit)):
end_word.append(ArEn[lit[i]])
jon = ""
print(jon.join(end_word))
convert("اُلوَ")
the reult is
AwLWa
I could'nt find a solution to a problem that has been hindering the use of notepad++.
When you double click text to highlight that text and others like it, camelCase or under_score words work great, but when hyphen-words-are-clicked this does not treat it as a single word and only highlights the segment between the "-".
question: how can you customize notepad++ so that hyphenated words are treated as single words? or does anyone know a text editor that does this?
saw this, but not sure how to implement it: http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Word_Customisation
this was really helpful: Where are the recorded macros stored in Notepad++?
Notepad++ rely on Scintilla for word selection. As caoanan noticed in his answer, Scintilla can be configured with the SCI_SETWORDCHARS variable. You can set this variable in Notepad++ with a simple NppExec script:
Install NppExec
Menu Plugins -> plugin Manager -> Show Plugin Manager
locate NppExec, check the box and hit Install
Create the script
Menu Plugins -> NppExec -> Execute ...
write this code (you can add other characters, like .$## at the end of the list):
NPP_CONSOLE 0
sci_sendmsg SCI_SETWORDCHARS 0 "CDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-"
hit Save...
You can now execute it by pressing OK
(optionnal) Execute when Notepad++ starts
Menu Plugins -> NppExec -> Advanced Options...
Choose your script in the Execute this script when Notepad++ starts drop down
2018
I tried with the published solutions but when I move to another file I have to run the script again every time. So I did this way, in the menu:
Settings > Preferences > Delimiter
select:
Add you character as part of word
insert hyphen:
-
and it worked.
I met with the same problem when editing Lisp/Scheme source codes with Notepad++.
The cure lies in the underlying Scintilla library (SciLexer.dll).
I've tried in a "blunt" way -- hack the code and rebuild SciLexer.dll.
Note the '-' added to the following code
CharClassify.cxx
void CharClassify::SetDefaultCharClasses(bool includeWordClass) {
// Initialize all char classes to default values
for (int ch = 0; ch < 256; ch++) {
if (ch == '\r' || ch == '\n')
charClass[ch] = ccNewLine;
else if (ch < 0x20 || ch == ' ')
charClass[ch] = ccSpace;
else if (includeWordClass && (ch >= 0x80 || isalnum(ch) || ch == '_' || ch == '-'))
charClass[ch] = ccWord;
else
charClass[ch] = ccPunctuation;
}
}
Or, the "smart" way, as mentioned at ScintillaDoc.html
SCI_SETWORDCHARS(<unused>, const char *characters)
Scintilla has several functions that operate on words, which are
defined to be contiguous sequences of characters from a particular set
of characters. This message defines which characters are members of
that set. The character sets are set to default values before
processing this function. For example, if you don't allow '_' in your
set of characters use: SCI_SETWORDCHARS(0,abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");
http://sourceforge.net/p/notepad-plus/discussion/1290590/thread/39ba5cd8/
Install Npp_Exec plugin and copy the string from this thread adding the signs you want Npp considers as part of a word.
I don't know that option in Notepad++ yet. Since, you've asked about any other text editor that does so, I would recommend you to use Sublime Text. It's a really cool text editor with lots of smart features. I bet you'll love it. By default, it does not treat the hyphenated words as a single word. But it's way too easy to customize the setting for that. All you need to do is go to 'Preference-> Setting-Default', where you'll find the following setting:
"word_separators": "./\\()\"'-:,.;<>~!##$%^&*|+=[]{}`~?",
From there, just remove the hyphen and we're done!
A workaround is to:
In the Find window, set the text to find to a space
In the Shortcut Mapper (Main Menu section) assign "Find Next" to Ctrl+Right and "Find Previous" to Ctrl+Left
Now, so long as the search text is only a space, it will effectively be the only delimiter. If you need other delimiters, for instance comma and period, set the Find text to [ ,.].
If you don't have admin rights and no Plugin Mgr, you can install most plugins by downloading a dll/zip file and saving the dll to the 'plugins' sub-folder under your npp install. Then restart npp.
I have this dataset in a csv file
1.33570301776, 3.61194e-06, 7.24503e-06, -9.91572e-06, 1.25098e-05, 0.0102828, 0.010352, 0.0102677, 0.0103789, 0.00161604, 0.00167978, 0.00159998, 0.00182596, 0.0019804, 0.0133687, 0.010329, 0.00163437, 0.00191202, 0.0134425
1.34538754675, 3.3689e-06, 9.86066e-06, -9.12075e-06, 1.18058e-05, 0.00334344, 0.00342207, 0.00332897, 0.00345504, 0.00165532, 0.00170412, 0.00164234, 0.00441903, 0.00459294, 0.00449357, 0.00339737, 0.00166596, 0.00451926, 0.00455153
1.34808186291, -1.99011e-06, 6.53026e-06, -1.18909e-05, 9.52337e-06, 0.00158065, 0.00166529, 0.0015657, 0.0017022, 0.000740644, 0.00078635, 0.000730052, 0.00219736, 0.00238191, 0.00212762, 0.00163783, 0.000750669, 0.00230171, 0.00217917
As you can see, the numbers are formatted differently and misaligned. Is there a way in vim to quickly align the columns properly, so that the result is this
1.33570301776, 3.61194e-06, 7.24503e-06, -9.91572e-06, 1.25098e-05, 0.0102828, 0.010352, 0.0102677, 0.0103789, 0.00161604, 0.00167978, 0.00159998, 0.00182596, 0.0019804, 0.0133687, 0.010329, 0.00163437, 0.00191202, 0.0134425
1.34538754675, 3.3689e-06, 9.86066e-06, -9.12075e-06, 1.18058e-05, 0.00334344, 0.00342207, 0.00332897, 0.00345504,0.00165532, 0.00170412, 0.00164234, 0.00441903, 0.00459294, 0.00449357, 0.00339737, 0.00166596, 0.00451926, 0.00455153
1.34808186291, -1.99011e-06, 6.53026e-06, -1.18909e-05, 9.52337e-06, 0.00158065, 0.00166529, 0.0015657, 0.0017022, 0.000740644,0.00078635, 0.000730052,0.00219736, 0.00238191, 0.00212762, 0.00163783, 0.000750669,0.00230171, 0.00217917
That would be great to copy and paste sections with ctrl-v. Any hints?
If you're on some kind of UNIX (Linux, etc), you can cheat and filter it through the column(1) command.
:%!column -t
The above will parse on delimiters inside string literals which is wrong, so you will likely need pre-processing steps and specifying the delimiter for this file for example:
%!sed 's/","/\&/' | column -t -s '&'
Sometimes we want to align just two columns. In that case, we don't need any plugins and can use pure Vim functionality like this:
Choose a separator. In OP's post this is a comma, in my example this is =.
Add spaces before/after it. I use s/=/= ...spaces... / in visual selection for this.
Locate to the longest word and place cursor after it.
Remove all the extra whitespace using dw and vertical movement.
Example of this technique demonstrated below:
I don't find myself needing to align things often enough to install another plugin, so this was my preferred way of accomplishing it - especially that it doesn't require much thinking.
As sunny256 suggested, the column command is a great way of doing this on Unix/Linux machines, but if you want to do it in pure Vim (so that it can be used in Windows as well), the easiest way is to install the Align plugin and then do:
:%Align ,
:%s/\(\s\+\),\s/,\1/g
The first line aligns the entries on the commas and the second moves the comma so that it's flush with the preceding value. You may be able to use AlignCtrl to define a custom mapping that does the whole lot in one go, but I can never remember how to use it...
Edit
If you don't mind two spaces between entries and you want to do this in one command, you can also do:
:%Align ,\zs
This is a great answer using vim macros: https://stackoverflow.com/a/8363786/59384 - basically, you start recording a macro, format the first column, stop recording then repeat the macro for all remaining lines.
Copy/pasted from that answer:
qa0f:w100i <Esc>19|dwjq4#a
Note the single space after the 100i, and the <Esc> means "press escape"--don't type "<Esc>" literally.
Translation:
qa -- record macro in hotkey a
0 -- go to beginning of line
f: -- go to first : symbol
w -- go to next non-space character after the symbol
100i <Esc> -- insert 100 spaces
19| -- go to 19th column (value 19 figured out manually)
dw -- delete spaces until : symbol
j -- go to next line
q -- stop recording macro
4#a -- run the macro 4 times (for the remaining 4 lines)
We now also have the fabulous EasyAlign plugin, written by junegunn.
Demonstration GIF from its README:
Also, Tabularize is quite good http://vimcasts.org/episodes/aligning-text-with-tabular-vim/
You could use the csv.vim plugin.
:%ArrangeColumn
However, this will not do exactly what you have asked: it will right adjust the contents of cells, whereas you have your values aligned by the decimal point or by the first digit.
The plugin has many other useful commands for working with CSV files.
also if you have very long columns it can be handy to disable default wrapping
:set nowrap
:%!column -t
(note in debian you also have a further option for column -n which if you want to split multiple adjacent delimiters)
Here’s a pure Vim script answer, no plugins, no macros:
It might be most clear to start out with my problem’s solution as an example. I selected the lines of code I wanted to affect, then used the following command (recall that entering command mode from visual mode automatically prepends the “'<,'>”, so it acts on the visual range):
:'<,'>g``normal / "value<0d>D70|P`
Except I did NOT actually type “<0d>”. You can enter unprintable characters on the command line by pressing ctrl-v, then the key you want to type. “<0d>” is what is rendered on the command line after I typed ‘ctrl-v enter’. Here, it’s parsed by the “normal” command as the exit from “/” search mode. The cursor then jumps to “ value” in the current line.
Then we simply [D]elete the rest of the line, jump to column 70 (or whatever you need in your case), and [P]ut what we just deleted. This does mean we have to determine the width of the widest line, up to our search. If you haven’t put that information in your statusline, you can see the column of the cursor by entering the normal mode command ‘g ctrl-g’. Also note that jumping to a column that doesn’t exist requires the setting 'virtualedit'!
I left the search term for the :g(lobal) command empty, since we used a visual block and wanted to affect every line, but you can leave off using a visual selection (and the “'<,'>”) and put a search term there instead. Or combine a visual selection and a search term to narrow things more finely/easily.
Here’s something I learned recently: if you mess up on a complex command mode command, undo with ‘u’ (if it affected the buffer), then press “q:” to enter a special command history buffer that acts much like a conventional buffer. Edit any line and press enter, and the changed command is entered as a new command. Indispensable if you don’t want to have to stress over formulating everything perfectly the first time.
I just wrote tablign for this purpose. Install with
pip3 install tablign --user
Then simply mark the table in vim and do
:'<,'>:!tablign
Pretty old question, but I've recently availed myself of an excellent vim plugin that enables table formatting either on the fly or after-the-fact (as your use case requires):
https://github.com/dhruvasagar/vim-table-mode
I have this in my .vimrc.
command! CSV set nowrap | %s/,/,|/g | %!column -n -t -s "|"
This aligns the columns while keeping the comma, which may be needed later for correct reading. For example, with Python Pandas read_csv(..., skipinitialspace=True), thanks Pandas guys for this smart option, otherwise in vim %s/,\s\+/,/g. It may be easier if your column has the option --output-separator I guess, my doesn't and I'm not sure why (my man page for column says 2004, on ubuntu 18.04, not sure ubuntu will get a new version). Anyway, this works for me, and comment if you have any suggestions.
I made a cli tool written in Perl.
You can find it here: https://github.com/bas080/colcise
How do you convert all text in Vim to lowercase? Is it even possible?
I assume you want lowercase the text. Solution is pretty simple:
ggVGu
Explanation:
gg - goes to first line of text
V - turns on Visual selection, in line mode
G - goes to end of file (at the moment you have whole text selected)
u - lowercase selected area
If you really mean small caps, then no, that is not possible – just as it isn’t possible to convert text to bold or italic in any text editor (as opposed to word processor). If you want to convert text to lowercase, create a visual block and press u (or U to convert to uppercase). Tilde (~) in command mode reverses case of the character under the cursor.
If you want to see all text in Vim in small caps, you might want to look at the guifont option, or type :set guifont=* if your Vim flavour supports GUI font chooser.
use this command mode option
ggguG
gg - Goto the first line
g - start to converting from current line
u - Convert into lower case for all characters
G - To end of the file.
Similar to mangledorf's solution, but shorter and layman friendly
:%s/.*/\L&/g
Many ways to skin a cat... here's the way I just posted about:
:%s/[A-Z]/\L&/g
Likewise for upper case:
:%s/[a-z]/\U&/g
I prefer this way because I am using this construct (:%s/[pattern]/replace/g) all the time so it's more natural.
Toggle case "HellO" to "hELLo" with g~ then a movement.
Uppercase "HellO" to "HELLO" with gU then a movement.
Lowercase "HellO" to "hello" with gu then a movement.
For examples and more info please read this:
http://vim.wikia.com/wiki/Switching_case_of_characters
use ggguG
gg: goes to the first line.
gu: change to lowercase.
G: goes to the last line.
Usually Vu (or VU for uppercase) is enough to turn the whole line into lowercase as V already selects the whole line to apply the action against.
Tilda (~) changes the case of the individual letter, resulting in camel case or the similar.
It is really great how Vim has many many different modes to deal with various occasions and how those modes are neatly organized.
For instance, v - the true visual mode, and the related V - visual line, and Ctrl+Q - visual block modes (what allows you to select blocks, a great feature some other advanced editors also offer usually by holding the Alt key and selecting the text).
If you are running under a flavor of Unix
:0,$!tr "[A-Z]" "[a-z]"
I had a similar issue, and I wanted to use ":%s/old/new/g", but ended up using two commands:
:0
gu:$