Book translation data format

Book translation data format - vim

I'm thinking of translating a book from English to my native language. I can translate just fine, and I'm happy with vim as a text editor. My problem is that I'd like to somehow preserve the semantics, i.e. which parts of my translation correspond to the original.
I could basically create a simple XML-based markup language, that'd look something like
<book>
<chapter>
<paragraph>
<sentence>
<original>This is an example sentence.</original>
<translation lang="fi">Tämä on esimerkkilause.</translation>
</sentence>
</paragraph>
</chapter>
</book>
Now, that would probably have its benefits but I don't think editing that would be very fun.
Another possibility that I can think of would be to keep the original and translation in separate files. If I add a newline after each translation chunk and keep line numbering consistent, editing would be easy and I'd be able to programmatically match the original and translation.
original.txt:
This is an example sentence.
In this format editing is easy.
translation-fi.txt:
Tämä on esimerkkilause.
Tässä muodossa muokkaaminen on helppoa.
However, this doesn't seem very robust. It would be easy to mess up. Probably someone has better ideas. Thus the question:
What would be the best data format for making a book translation with a text editor?
EDIT: added tag vim, since I'd prefer to do this with vim and believe that some vim guru might have ideas.
EDIT2: started a bounty on this. I'm currently leaning to the second idea I describe, but I hope to get something about as easy to edit (and quite easy to implement) but more robust.

One thought: if you keep each translatable chunk (one or more sentences) in its own line, vim's option scrollbind, cursorbind and a simple vertical split would help you keeping the chunks "synchronized". It looks very much like to what vimdiff does by default. The files should then have the same amount of lines and you don't even need to switch windows!
But, this isn't quite perfect because wrapped lines tend to mess up a little bit. If your translation wraps over two or three more virtual lines than the original text, the visual correlation fades as the lines aren't one-on-one anymore. I couldn't find a solution or a script for fixing that behavior.
Other suggestion I would propose is to interlace the translation into the original. This approaches the diff method of Benoit's suggestion. After the original is split up into chunks (one chunk per line), I would prepend a >> or similar on every line. A translation of one chunk would begin by o. The file would look like this:
>> This is an example sentence.
Tämä on esimerkkilause.
>> In this format editing is easy.
Tässä muodossa muokkaaminen on helppoa.
And I would enhance the readability by doing a :match Comment /^>>.*$/ or similar, whatever looks nice with your colorscheme. Probably it would be worthwhile to write a :syn region that disables spell checking for the original text. Finally, as a detail, I'd bind <C-j> to do 2j and <C-k> to 2k to allow easy jumping between the parts that matter.
Pros for this latter approach also include that you could wrap things in 80 columns if you feel like I do :) It would still be trivial to write <C-j/k> to jump between translations.
Cons: buffer-completion suffers as now it completes both original and translated words. English words don't hopefully occur in the translations that often! :) But this is as robust as it gets. A simple grep will peel the original text off after you are done.

Why not use a simplified diff format?
it is linewise which is suitable for whole sentences.
The first character is significant (space, special, + or -)
It will be quite compact
Maybe you needn't those ## parts
Vim will support it and color the English sentence and the Finnish sentence in distinct colors.

Assuming you want to keep the 1 - 1 relationship between the original text and the translated text, a database table makes the most sense.
You'd have one table with the following columns:
id - Integer - Autonum
original_text - Text - Not null
translated_text - Text - Nullable
You'd need a process to load the original text, and a process to show you one line of the original text and allow you to type the translated text. Perhaps the second process could show you 5 lines (2 before, the line you want to translate, and 2 after) to give you context.

Related

Toward Vim moves from conventional moves (<left> <right> <up> <down> <backspace>)

I am not trying to play golf with my editor. I am just trying to improve my editing skills with vim.
Let's consider this piece of assembly that I would like to convert to C. In order to do it methodically, I want to make small changes iteratively line after line.
dm(__abcd_bar_id + axis) = f4;
f1 = dm(_abcd_foo_id + axis);
f5 = f4 - f1;
The job with this example is:
Simplify the first line with abcd_bar_id[axis] = f4
Simplify the second line with f1 = abcd_foo_id[axis]
Replace f1 in the third line with the second line
Remove the second line
These steps are not negotiable. I know I can easily get rid of all my dm(__variable + index) with a regex like the one below but this is off topic.
:%s/dm\s*(\s*_\+\(\w\+\)\s\++\s\+\(\w\+\)\s*)/\1[\2]/g
So, to achieve these changes I traditionally do this:
▶▶▶▶DelDelDelDelDel▶▶▶▶▶▶▶▶▶▶▶▶[DelDelDel▶▶▶▶Right]
▼DeleteDelDel[▶▶▶▶]Del
Home▶▶▶▶RightDelDelDelDel
Shift+End Shift+◀ Ctrl+c
▼End◀◀BackspaceBackspace Ctrl+v
And the result should be this:
abcd_bar_id[axis] = f4;
f5 = f4 - abcd_foo_id[axis];
What saves me is I am quite fast hitting the same key multiple times. However I am sure I can be more productive if I use vi features
vfahd
wh3lxi[wr]
j:%s/dm(_//Enter
f+hv2lxi[Escwr]
$hvF2ay
jf1hhplxxx
Well, this seems to me much more complicated for my brain because a pre-processing bain-time is needed before each keystrokes.
For instance if I want to move to f1 I need to parse with my eyes if there is no other 1 on the way to f1.
I really feel I need years of training to be 'fluent' with vim.
So the questions are:
How a vim guru will treat this example?
Does a vim guru exist?

I definitely don't consider myself vim guru, although I use it on the daily basis. Answering your second question first, probably there's somebody who can be treated as a guru, there are simply so many options and possibilities in vim, that everybody can have their own way of doing things. Moreover, because you can tailor vim to your needs, it's easy to simplify regular tasks, and those configurations may differ a lot. Also people who are considered gurus by me (like, for instance, Derek Wyatt) claim that have still much to learn about vim, so it can definitely take years to become one.
But don't be discouraged, it takes only some practise to start thinking vim-way, and your editing tasks will become much easier :)
Back to your example. First of all, I'd edit the first line with slightly less keystrokes:
dta
f)r]
bdTd
i[
The difference isn't huge in terms of number of keystrokes, but it illustrates different approach. It allows, in my opinion, much less pre-processing, which is the problem you highlighted. I divided those keystrokes into sections to show you my thought process:
delete till a
find ) and replace it with ]
back one word and delete Till (backwards) d
insert [
I don't have to think much, when I apply those changes. You might think that this is counter-intuitive, that I jumped to ) character first, but it was much easier for me to spot closing bracket than count words or
hit h or l multiple times. Of course you might know the keystrokes but when you edit something you don't always remember all of them. This comes with practise and forcing yourself to use some of them (like t/T)
to put them firmly under your fingers. Also, print a cheat-sheet trying to make use of every key, until you'll learn it by heart. It won't take long ;)
As William already suggested in the comment, I'd also think about macro here. It's a powerful and easy-to-use tool, which can really automate your changes.
I already know how to edit first line. In your example, I know that in the second step I'll be doing the same thing, but in slightly different location, so instead of editing first line, I instantly record a macro, but I have to make it universal
for easier application. So I think about putting my cursor in proper location first, before making any changes. My macro would look like this:
qq
0fd dta f)r] bdTd i[
q
Notice, that I added 3 keystrokes at the beginning (not counting qq, which starts recording macro to q register). That might look redundant in the first line, but it ensures proper location of the cursor before making any changes.
That way I can easily apply this macro in the second line with #q
Now, you have to replace this f1 in the third line. You're still in the second line with your cursor, so you just yank with:
0fay$
and then paste it to the third line:
j$bPlD
Using macros mith look like a redundant thing when you edit just 3 lines, but when you get used to making changes in a vim way, you'll really feel you're taking advantage of it's power.
When it comes to remembering recorded macros it's not that hard, you have to have the proper attitude. First of all, you record your macros to registers, so typing :registers will show you also your macros. Secondly, you can edit them,
by pasting specific register, altering it and then saving to the same register. And then you can play it with #[register_letter]. And finally, don't get attached to specific macros. Save one or two, use them to make multiple changes at
once and forget about them. And then record another one under the same letter. For example, if you realize that you have to make some repetitive change across the file, use qq, because it's fast and intuitive. After making changes you rarely
need to play the same macro over again, because whole buffer is already in the right state. But if you know, that you'll need it, record next macro under another letter. If you'll get comfortable making changes intuitively vim way, so that
they can easily be repeted, you'll find that's much easier to record another macro than trying to remember under which letter you recorded previous one.
I hope that this answer will convince you, that you don't need years of training to get fluent, but of course it won't happen overnight ;)

Is there a less fantastically kludgy way to do one-off highlights in Vim?

i. The Problem
My goal is something like the following:
I have a line of text like
Who left the dead mouse in the fridge?
and I want to highlight the first the in green, just this one occurrence. That is, I don't want to syn match ThisMagicWord "\<the\>" or anything that will overzealously highlight other thes.
There is one other requirement, which is that if the user edits the other text on the line, say to
Who on earth left the delicious dead mouse in the fridge?
the highlighting will track with the word the, so long as the user doesn't edit that one particular word.
ii. The Kludge
Now, I have a solution to this. In fact, I am proud of my solution, because it was tricky to think up. But it is not, by any stretch of the imagination, a good solution.
It turns out that the Unicode character Combining Grapheme Joiner is effectively a no-op in Vim. It produces no glyph, and takes up no width. It is the only such character that I have discovered. So what I do is, I surreptitiously edit the line in question to be
Who left the<CGJ> dead mouse in the fridge?
and then define a rule
syn match ThisMagicWord "the<CGJ>"
I will additionally trigger on BufWritePre and BufWritePost to strip the CGJs out of the file on disk.
iii. The Questions
Is there a no-op character in Vim (or a way to produce one) other than CGJ? Ideally a non-combining character, since the<CGJ> will not match a search for /the, due to the way Vim regexes handle combining characters.
Is there a better way to get at the behavior that I want?

You're right that there's currently no good way to mark static matches and keep them up-to-date when edits are done nearby. My approach would have been worse than your kludge: Include the line / column information in the match (via the \%l and \%v special atoms), and attempting to update those with a combination of marks (works for line changes) and intra-line custom diffing.
Though your use of special Unicode characters is clever, it's (as you admit) a hack. I've asked you for uses in the comments, and am still not completely satisfied / convinced. If you can come up with good, real use cases and current pain points, please direct them to the vim_dev mailing list (best with a functional draft patch attached). The functionality to keep track of such text is basically there (in the Vim internals), it's just not yet tracked and exposed to users / Vimscript. Though Vim development has been (often frustratingly) slow, with a compelling argument on your side, new functionality can and does happen.

How about using marks?
Move the cursor to the word you want, set a lowercase letter mark (e.g. mz), then add highlighting for the word like \%'zthe

VIM Delete Standard Scripting Word Groups

How would you use VIM to delete a word group, which includes white space characters, but is a standard grouping you would want to access when scripting? Specifically, when you have your cursor over some part of the following text, how would delete help="initialize, lines, h2, derivs, tt, history", from below. Maybe one would need to create specific mappings. But on the other hand, it seems pretty natural to want to access text like this if you are using VIM to edit scripting programs.
parser = argparse.ArgumentParser()
parser.add_argument("task", help="initialize, lines, h2, derivs, tt, history", default='yes')

Vim has a variety of text objects built-in, e.g. da" deletes quoted text (including the quotes; di" keeps the quotes). See :help text-objects for more information.
There are some plugins, e.g. textobj-user - Support for user-defined text objects and my own CountJump plugin that make it easy to define your own, "special" text objects. Also, you'll find many such text objects on vim.org. Based on your example, argtextobj.vim - Text-object like motion for arguments may be exactly what you need here.

If you are inside the " you want to delete, I would use:
di"diW
If you were above help=, I would use something like:
d/defEnter
to remove everything until you encounter default, followed by a few x, and left-wise motion, to remove the remaining characters.
I don't really think a new mapping is needed, but your experience may vary.

What makes sense from Vim's perspective and according to its design goals is to provide small and generic elements and a few rules to combine them in order to achieve higher level tasks. It does quite a good job, I'd say, with its numerous text-objects and motions but we always have to repeat domain-specific tasks and that's exactly where Vim's extensibility comes into play. It is where users and plugin authors fill the gap with custom mappings/object/functions and… plugins.
It is fairly easy, for example, to record a macro and map it for later reuse. Or create a quick and dirty custom text-object…
The following snippet should work with your sample.
xnoremap aa /\v["'][,)]/e<CR>o?\v\s+\w+\=<CR>
onoremap aa :normal vaa<CR>
With it, you can do daa, caa, yaa and vaa from anywhere within that argument.
Obviously, this solution is extremely specific and making it more generic would most certainly involve a bit more thought but there are already relatively smart solutions floating around, as in Ingo's answer.

Reformatting in Vim, the sensible way

I don't often reformat text, apart from the plain gq so this is probably something simple, but just don't seem to have the luck of finding it in the help.
Anyways, I have the text that looks like this
Funnily enough, that was exciting.
"I've just about had enough of this," said a voice beside him.
He looked up. A girl had come down the other path. Her face was red with exertion under the pale make-up, her hair hung over her eyes in ridiculous ringlets, and she wore a dress which, while clearly made for her size, was designed for someone who was ten years younger and keen on lace edging.
She was quite attractive, although this fact was not immediately apparent.
"And you know what they say when you complain?" she demanded. This was not really addressed to Victor. He was just a convenient pair of ears.
And that's a pain to read in Vim. So I tried to reformat it with gq and that gives me this
Funnily enough, that was exciting. "I've just about had enough of this,"
said a voice beside him. He looked up. A girl had come down the other path.
Her face was red with exertion under the pale make-up, her hair hung over
her eyes in ridiculous ringlets, and she wore a dress which, while clearly
made for her size, was designed for someone who was ten years younger and
keen on lace edging. She was quite attractive, although this fact was not
immediately apparent. "And you know what they say when you complain?" she
demanded. This was not really addressed to Victor. He was just a convenient
pair of ears.
which is rather useless, since the original line endings have special meaning in this case. What I'm trying to accomplish is this
Funnily enough, that was exciting.
"I've just about had enough of this," said a voice beside him.
He looked up. A girl had come down the other path. Her face was red with
exertion under the pale make-up, her hair hung over her eyes in ridiculous
ringlets, and she wore a dress which, while clearly made for her size, was
designed for someone who was ten years younger and keen on lace edging.
She was quite attractive, although this fact was not immediately apparent.
"And you know what they say when you complain?" she demanded. This was not
really addressed to Victor. He was just a convenient pair of ears.
i.e. to keep the original line endings, but to "break" every line longer than textwidth into several lines. So it fits the predefined column width limits.
Anyone have any ideas on how to do that? It is a rather large-ish document, and I need some way of handling it in one piece.

Select visually all lines then execute in ex mode:
:norm gqq
gqq reformats a single line. :norm with a range applies a normal code to each in individually in the range. That means you apply gqq on each single line individually. And because your textwidth is set to a certain length (for example 80) that means shorter lines will not be joined/wrapped.
I've tested this on your example text and it just gives what you want.
Btw, you can use vim's :formatprg to modify it with an external prg. That gives more control of what you want modify with an external application. For more info read :h formatprg

Do you just want to do this for reading purposes? If so, you should consider just turning on line wrapping at word breaks. In command mode:
:set wrap
:set linebreak

Assuming this is on Linux, there are a number of utilities to do what you're wanting - fmt, roff/nroff/troff and variants, etc. fmt is one I use often, but it would require that you have a blank line between each paragraph - that's easy to accomplish in vim, though. So you could add blank lines, save the file, then run it by fmt -76 for example to limit each line to 76 characters.

A primitive way, but in general managed to do it with
tw=80
qa (recording a macro)
Vgq
q (stop recording)
nmap <C-p> :execute "normal! #a"<cr>
and by holding <C-p> for quite a while. Not the most elegant of solutions but worked.

You can make gq think that a series of lines belongs to one paragraph if every line of the series except the last one ends with a space:
set formatoptions+=w
. After this setting gq won’t join lines in your example (unless you have trailing spaces there) and you will still be able to join them back using :%s/ \n/ /. Alternative is to add empty lines between each current line.
I also suggest doing
set list listchars+=trail:-
in order not to only make vim see where the paragraph ends, but to be able to see this by yourself (this setting will show you trailing whitespaces).

Which editors out of Emacs, Vim and JEdit support multiple simultaneous text insertion points?

Background: JEdit (and some other text editors as well) support a feature called Multiple simultaneous text insertion points. (at least that's what I'm calling it here).
To understand what this means, take a look at the link.
Out of all the features in use in modern text editors, initial research seems to indicate that this is one feature that both Emacs and Vim do not actually support. If correct, this would be pretty exceptional since it's quite difficult to find a text editor feature that has not made its way into at least one of these two old-school editors.
Question: Has anyone ever seen or implemented this feature in either Emacs, Vim, or both? If so, please point me to a link, script, reference or summary that explains the details.
If you know an alternate way to do the same (or similar) thing, please let me know.

The vim way to do this is the . command which repeats the last change. So, for instance, if I change a pointer to a reference and I have a bunch of
obj->func
that I want to change to
obj.func
then I search for obj->, do 2cw to change the obj-> to obj., then do n.n.n. until all the instances are changed.
Perhaps not a flexible as what you're talking about, but it works frequently and is very intuitive and fast when it does.

moccur-edit.el almost does what you want. All the locations matching the regexp are displayed, and the editing the matches makes changes in the corresponding source. However, the editing is done on a single instance of the occurrence.
I imagine it'd be straight forward to extend it to allow you to edit them all simultaneously (at least in the simple case).
There is a demo of it found here.
Turns out, the newest versions of moccur-edit don't apply changes in real-time - you must apply the changes. The changes are also now undoable (nice win).

In EMACS, you could/would do it with M-x find-grep and a macro. If you really insist that it be fully automatic, then you'd include the find-next in the macro.
But honestly, this strikes me as a sort of Microsoft-feature: yes, it adds to the feature list, but why bother? And would you remember it existed in six months, when you want to use it again?

For emacs, multiple-cursors does exactly that.
Have a look at emacsrocks episode 13, by the author of the module.

I don't think this feature has a direct analogue in either Emacs or Vim, which is not to say that everything achievable with this feature is not possible in some fashion with the two 'old-school' editors. And like most things Emacs and Vim, power-users would probably be able to achieve such a task exceedingly quickly, even if mere mortals like myself could spend five minutes figuring out the correct grep search and replace with appropriate back-references, for example.

YASnippet package for Emacs uses it. See 2:13 and 2:44 in the screencast.

Another slight similarity: In Emacs, the rectangle editing features provided by cua-selection-mode (or cua-mode) automatically gives you multiple insertion points down the left or right edge of the marked rectangle, so that you can type a common prefix or suffix to all of those lines.
e.g.:
M-x cua-selection-mode RET (enable the global minor mode, if you don't already use this or cua-mode)
C-RET down down down (marks a 1x3 character rectangle)
type prefix here
C-RET (unmark the rectangle to return to normal editing)

It should be something like this in vim:
%s/paint.\((.*),/\1.paint(/
Or something like that, I am really bad at "mock" regular expressions.
The idea is substitute the pattern:
/paint(object,/
with
/object.paint(/
So, yes, it is "supported"

It seemed simple to do a basic version of this in Emacs lisp. This is for when you just want two places to insert text in parallel:
(defun cjw-multi-insert (text)
"insert text at both point and mark"
(interactive "sText:")
(insert-before-markers text)
(save-excursion
(exchange-point-and-mark)
(insert-before-markers text)))
When you run it, it prompts for text and inserts it at both point (current position) and mark. You can set the mark with C-SPC. This could be easily extended for N different positions. A function like set-insert-point would record current position (stored as an Emacs marker) into a list and then when you run the multi-insert command, it just iterates through the list adding text at each.
I'm not sure about what would a simple way to handle a more general "multi-editing" feature.

Nope. This would be quite difficult to do with a primarily console-based UI.
That said, there is similar features in vim (and emacs, although I've not used it nearly as much) - search and replace, as people have said, and more similarly, column insert mode: http://pivotallabs.com/users/brian/blog/articles/350-column-edit-mode-in-vi

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string