language for simple key-value file (mostly for syntax highlighting) - programming-languages

so i have some simple files that contain key-value pairs like this:
TITLE language for simple key-value file (mostly for syntax highlighting)
DESCRIPTION so i have some simple files that contain key-value pairs like this.
DESCRIPTION
DESCRIPTION values can span multiple lines (in which case the key is just
DESCRIPTION repeated)
URL https://stackoverflow.com/questions/72677445
so basically this is a declarative DSL for expressing dictionaries: the first word in each line is the key, and the rest is the value; key and value are separated by an arbitrary amount of whitespace.
now, i would like these files to gain some basic syntax-highlighting (so that the "key" part is shown differently from the "value" part), mostly for github/gitlab/...
unfortunately I have no idea which "language" i should use.
the syntax is so minimal, that autodetection just ends up with "plain text".
modern syntax highlighters support a plethora of languages (e.g. linguist lists about 650 different languages), which makes it impractical to try each one.
any ideas how this "language" is called (or which base-language i could abuse for syntax-highlighting)?

Related

How to create a centralized syntax file that be able to recognize multiple parts with different syntaxes?

For i.e: I'd like to have a custom syntax file, may be called sugar.vim that includes multiple other syntax files(?) to have the ability to highlight, maybe a paragraph as python.vim and another paragraph as javascript.vim, may be separated by newline (paragraphs often distinct by newline)
The real case that I often catch myself writing a document (non-extension file) other than real config a specific filetype (specific extension file), but for clear readability in the document filetype (we called sugar above). I'm thinking about a mechanism to recognize and highlight different parts of a filetype as different syntaxes.
To narrow down this case. How would it be to have a syntax file called sugar.vim that would be able to recognize python syntax and javascript syntax in files that have an extension of .sugar then the recognized python text should have highlights applied as a normal python file, same for javascript part. All recognized text must be separated by newline (at least one before and one after that text)
Sample:
# this is a sample text for this question
# i'm writing a document that has an extension of `.sugar`
def py_func1(arg1, arg2) # python.vim and its highlights applied here.
print("bello world!")
square = function(x) { # javascript.vim and its highlights applied here.
return x * x;
};
System: gvim 8.1 / windows10
Thanks in advances.
Vim supports that with the :help :syn-include command. As it's intended for syntax script writers leveraging other syntaxes, its use is somewhat complicated, and it's not really suited for interactive, on-demand use.
My SyntaxRange plugin provides commands and functions to set up regions in the current buffer that either use a syntax different from the buffer's 'filetype', or completely ignore the syntax. With it, it's trivial to dynamically add a particular syntax highlighting for a range of lines, and public API functions also make the programmatic definition easier.
You're looking for :help :syn-include.
Excerpt from vim help :
If top-level syntax items in the included syntax file are to be
contained within a region in the including syntax, you can use the
":syntax include" command:
:sy[ntax] include [#{grouplist-name}] {file-name}
All syntax items declared in the included file will have the
"contained" flag added. In addition, if a group list is specified,
all top-level syntax items in the included file will be added to
that list. >
" In perl.vim:
:syntax include #Pod :p:h/pod.vim
:syntax region perlPOD start="^=head" end="^=cut" contains=#Pod
When {file-name} is an absolute path (starts with "/", "c:", "$VAR"
or "") that file is sourced. When it is a relative path
(e.g., "syntax/pod.vim") the file is searched for in 'runtimepath'.
All matching files are loaded. Using a relative path is
recommended, because it allows a user to replace the included file
with his own version, without replacing the file that does the ":syn
include".
As long as you can clearly define boundaries for your embedded language regions it is fairly straight forward to achieve this.
You can also refer to https://github.com/tpope/vim-markdown/blob/master/syntax/markdown.vim for reference on how tpope embeds other syntax definitions within the markdown syntax, driven by configuration to minimise the number of language syntax's that need embedding for optimal performance.

Usable Unicode Ranges for Custom Text Process

I am working on a processor that parts texts into blocks with marks:
LOREM IPSUM SED AMED
will be parsed like:
{word:1}LOREM{/word:1}{space:2}
{word:3}IPSUM{/word:3}{space:4}
{word:5}SED{/word:5}{space:6}
{word:7}AMED{/word:7}
But I dont want to use "{word}" etc, because it causes processor down, because it is an string again... I need to mark like these:
\E002\0001 LOREM \E003\0001 \E004\0002
\E002\0003 IPSUM \E003\0004 \E004\0005
\E002\0006 SED \E003\0006 \E004\0007
\E002\0008 AMED \E003\0008
First \E002 means element type number, its last bit represent element's close. So element number increments with +2.
Second \0001 means element index for stacking.
I am just used \E002 irrelevantly for this example.
But \0001 also using in Unicode Range, and this leads me to where I start again...
So which unicode range can I use? \ff0000? or how can I solve this?
Thanks!
The Unicode Consortium thought of this. There is a range of Unicode code points that are meant to never represent a displayable character, but meta-codes instead:
Noncharacters are code points that are permanently reserved and will never have characters
assigned to them.
...
Tag characters were intended to support a general scheme for the internal tagging of text
streams in the absence of other mechanisms, such as markup languages. The use of tag
characters for language tagging is deprecated.
(http://www.unicode.org/versions/Unicode9.0.0/ch23.pdf)
You should be able to use regular control characters as "private" tags, because these should never occur in proper strings. This would be the range from U+0000 to U+001F, excluding tab (U+0009), the common "returns" (U+000A and U+000D), and, for safety, U+0000 itself (some libraries do not like Null characters in the middle of strings).
Non-characters
Noncharacters are code points that are permanently reserved in the Unicode Standard for
internal use. They are not recommended for use in open interchange of Unicode text data.
You can use U+FEFF (which is currently officially defined as Not-A-Character), or U+FFFE and U+FFFF. There are several more "officially not-a-characters" defined, and you can be fairly sure they would not occur in regular text strings.
A few random sequences with predefined definitions, and so highly unlikely to occur in plain text strings are:
Specials: U+FFF0–U+FFF8
The nine unassigned Unicode code points in the range U+FFF0..U+FFF8 are reserved for
special character definitions.
Annotation Characters: U+FFF9–U+FFFB
An interlinear annotation consists of annotating text that is related to a sequence of annotated
characters. For all regular editing and text-processing algorithms, the annotated characters
are treated as part of the text stream. The annotating text is also part of the content,
but for all or some text processing, it does not form part of the main text stream.
Tag Characters: U+E0000–U+E007F
This block encodes a set of 95 special-use tag characters to enable the spelling out of ASCIIbased
string tags using characters that can be strictly separated from ordinary text content
characters in Unicode.
(all quotations from the chapter as above)
Staying within conventions, you can also use U+2028 (line separator) and/or U+2029 paragraph separator.
Technically, your use of U+E000–U+F8FF (the "Private Use Area") is okay-ish, because these code points only can define an unambiguous character in combination with a certain font. However, it is possible these codes may pop up if you get your plain text from a source where the font was included.
As for how to encode this into your strings: it doesn't really matter if the numerical code immediately following your private tag marker is a valid Unicode character or not. If you see one of your own tag markers, then the value immediately following is always your own private sequence number.
As you see, there are lots of possibilities. I guess the most important criterium is whether you want to use other functions on these strings. If you create a string that is technically invalid Unicode (for instance, because it includes not-a-character values), some external functions may choose to fail to work on them, or silently remove the bad values. In such a case, you'd need to rigorously stick to a system in which you only use 'valid' code points.

VIM Delete Standard Scripting Word Groups

How would you use VIM to delete a word group, which includes white space characters, but is a standard grouping you would want to access when scripting? Specifically, when you have your cursor over some part of the following text, how would delete help="initialize, lines, h2, derivs, tt, history", from below. Maybe one would need to create specific mappings. But on the other hand, it seems pretty natural to want to access text like this if you are using VIM to edit scripting programs.
parser = argparse.ArgumentParser()
parser.add_argument("task", help="initialize, lines, h2, derivs, tt, history", default='yes')
Vim has a variety of text objects built-in, e.g. da" deletes quoted text (including the quotes; di" keeps the quotes). See :help text-objects for more information.
There are some plugins, e.g. textobj-user - Support for user-defined text objects and my own CountJump plugin that make it easy to define your own, "special" text objects. Also, you'll find many such text objects on vim.org. Based on your example, argtextobj.vim - Text-object like motion for arguments may be exactly what you need here.
If you are inside the " you want to delete, I would use:
di"diW
If you were above help=, I would use something like:
d/defEnter
to remove everything until you encounter default, followed by a few x, and left-wise motion, to remove the remaining characters.
I don't really think a new mapping is needed, but your experience may vary.
What makes sense from Vim's perspective and according to its design goals is to provide small and generic elements and a few rules to combine them in order to achieve higher level tasks. It does quite a good job, I'd say, with its numerous text-objects and motions but we always have to repeat domain-specific tasks and that's exactly where Vim's extensibility comes into play. It is where users and plugin authors fill the gap with custom mappings/object/functions and… plugins.
It is fairly easy, for example, to record a macro and map it for later reuse. Or create a quick and dirty custom text-object…
The following snippet should work with your sample.
xnoremap aa /\v["'][,)]/e<CR>o?\v\s+\w+\=<CR>
onoremap aa :normal vaa<CR>
With it, you can do daa, caa, yaa and vaa from anywhere within that argument.
Obviously, this solution is extremely specific and making it more generic would most certainly involve a bit more thought but there are already relatively smart solutions floating around, as in Ingo's answer.

Vim Visual Select around Variables

I'm wondering if there's a way to select variables intelligently in the same way that one can select blocks using commands like va}. There's some language-specific parsing going on to differentiate php and ruby, for example. For future reference, It'd be nice to tap into that - ideally selecting around various syntactic elements.
For example. I'd like to select around $array['testing'] in the following line of php:
$array['testing'] = 'whatever'
Or, lets say I want to select the block parameter list |item, index| here:
hash.each_with_index { |item, index| print item }
EDIT:
Specific regexps might address the various questions individually, but I have a sense that there ought to be a way to leverage syntactic analysis to get something far more robust here.
Though your given examples are quick to select with built-in Vim text objects (the first is just viW, for the second I would use F|v,), I acknowledge that Vim's syntax highlighting could be a good source for motions and text objects.
I've seen the first implementation of this idea in the SyntaxMotion plugin, and I've recently implemented a similar plugin: SameSyntaxMotion. The first defines motions for normal and visual mode, but no operator-pending and text objects. It does not skip over contained sub-syntax items and uses same color as the distinguishing property, whereas mine uses syntax (which can be more precise, but also more difficult to grasp), and has text objects (ay and iy), too.
You can define your own arbitrary text objects in Vim.
The simplest way to do custom text objects is defining a :vmap (or :xmap) for the Visual mode part and an :omap for the Operator-pending mode part. For example, the following mappings
xnoremap aC F:o,
onoremap aC :normal! F:v,<CR>
let you select a colon-enclosed bit of text. Try doing vaP or daP on the word "colon" below:
Some text :in-colon-text: more of the same.
See :h omap-info for another short example of :omap.
If you don't mind depending on a plugin, however, there is textobj-user. This is a general purpose framework for custom text objects written by Kana Natsuno. There are already some excellent text objects written for that framework like textobj-indent which I find indispensable.
Using this you can easily implement filetype-dependent text objects for variables. And make it available for everybody!

Book translation data format

I'm thinking of translating a book from English to my native language. I can translate just fine, and I'm happy with vim as a text editor. My problem is that I'd like to somehow preserve the semantics, i.e. which parts of my translation correspond to the original.
I could basically create a simple XML-based markup language, that'd look something like
<book>
<chapter>
<paragraph>
<sentence>
<original>This is an example sentence.</original>
<translation lang="fi">Tämä on esimerkkilause.</translation>
</sentence>
</paragraph>
</chapter>
</book>
Now, that would probably have its benefits but I don't think editing that would be very fun.
Another possibility that I can think of would be to keep the original and translation in separate files. If I add a newline after each translation chunk and keep line numbering consistent, editing would be easy and I'd be able to programmatically match the original and translation.
original.txt:
This is an example sentence.
In this format editing is easy.
translation-fi.txt:
Tämä on esimerkkilause.
Tässä muodossa muokkaaminen on helppoa.
However, this doesn't seem very robust. It would be easy to mess up. Probably someone has better ideas. Thus the question:
What would be the best data format for making a book translation with a text editor?
EDIT: added tag vim, since I'd prefer to do this with vim and believe that some vim guru might have ideas.
EDIT2: started a bounty on this. I'm currently leaning to the second idea I describe, but I hope to get something about as easy to edit (and quite easy to implement) but more robust.
One thought: if you keep each translatable chunk (one or more sentences) in its own line, vim's option scrollbind, cursorbind and a simple vertical split would help you keeping the chunks "synchronized". It looks very much like to what vimdiff does by default. The files should then have the same amount of lines and you don't even need to switch windows!
But, this isn't quite perfect because wrapped lines tend to mess up a little bit. If your translation wraps over two or three more virtual lines than the original text, the visual correlation fades as the lines aren't one-on-one anymore. I couldn't find a solution or a script for fixing that behavior.
Other suggestion I would propose is to interlace the translation into the original. This approaches the diff method of Benoit's suggestion. After the original is split up into chunks (one chunk per line), I would prepend a >> or similar on every line. A translation of one chunk would begin by o. The file would look like this:
>> This is an example sentence.
Tämä on esimerkkilause.
>> In this format editing is easy.
Tässä muodossa muokkaaminen on helppoa.
And I would enhance the readability by doing a :match Comment /^>>.*$/ or similar, whatever looks nice with your colorscheme. Probably it would be worthwhile to write a :syn region that disables spell checking for the original text. Finally, as a detail, I'd bind <C-j> to do 2j and <C-k> to 2k to allow easy jumping between the parts that matter.
Pros for this latter approach also include that you could wrap things in 80 columns if you feel like I do :) It would still be trivial to write <C-j/k> to jump between translations.
Cons: buffer-completion suffers as now it completes both original and translated words. English words don't hopefully occur in the translations that often! :) But this is as robust as it gets. A simple grep will peel the original text off after you are done.
Why not use a simplified diff format?
it is linewise which is suitable for whole sentences.
The first character is significant (space, special, + or -)
It will be quite compact
Maybe you needn't those ## parts
Vim will support it and color the English sentence and the Finnish sentence in distinct colors.
Assuming you want to keep the 1 - 1 relationship between the original text and the translated text, a database table makes the most sense.
You'd have one table with the following columns:
id - Integer - Autonum
original_text - Text - Not null
translated_text - Text - Nullable
You'd need a process to load the original text, and a process to show you one line of the original text and allow you to type the translated text. Perhaps the second process could show you 5 lines (2 before, the line you want to translate, and 2 after) to give you context.

Resources