How to disable space when using pandoc with Chinese characters?

How to disable space when using pandoc with Chinese characters? - vim

When we use vim, we always set vim to limit the number of characters per line. Like this
set cc=80
set fo=+tMn
So if I convert a markdown file to a docx file, pandoc will automatically place a space at the tail of every line, which is nice for English docs.
But for chinese characters, it is wired for a space in a single sentence. So is there any methods to avoid this problem?

This is solution I found, from the MANUAL.
Extension: east_asian_line_breaks
Causes newlines within a paragraph to be ignored, rather than being treated as spaces or as hard line breaks, when they occur between two East Asian wide characters. This is a better choice than ignore_line_breaks for texts that include a mix of East Asian wide characters and other characters.

Related

Vim treats unicode characters as a separate text object from english alphabets

I use both of vim and neovim, but just found that both editors treat unicode characters (such as Korean characters) as a different text object from english alphabet separately.
For instance, if there is a text, my_str = 'ABC가나다', vim only removes ABC if I type dw at A.
Of course I would be able to use di' instead, but it's getting annoying with using surrounding.vim plugin.
Typing ysw[ only surrounds ABC because Vim recognizes 가나다 as a separate text object even tough there is no space between the two.
Will there be any solution for this?
Thank you so much.
June

Vim not detecting implicit newline characters instead of visible newline characters I am trying to strip

Here's an example of some text from which I'm trying to strip those newline characters, which appear explicitly in my vim, and replace them with actual newline characters that I don't see.
But when I search for a newline character using /[\n]/, what I get isn't these visible newline characters, but instead the implicit ones. So I can't do a search and replace.
How should I address this? Here is the text:
The Reason that can be reasoned\n is not the eternal Reason.The name that can\n be namedis not the eternal Name. The Unnamable is of heaven and earth the beginning.\n The Namable becomes of the\n ten thousand things the mother.Therefore it is said:\n '\n\n He\n who desireless is found\n The spiritual of the world will sound.\n But he who by desire is bound\n Sees the mere shell of things around.' These two things are the same in sour ce but different in name.\n Their sameness\n is called a mystery.Indeed
it is the mystery\n

You need to search for \\n, not [\n].
doing:
%s/\\n/\r/g
Should solve your problem (I have no idea why, but vim needs \r instead of \n')

How can I find the character code of a special character in my text editor?

When pasting text from outside sources into a plain-text editor (e.g. TextMate or Sublime Text 2) a common problem is that special characters are often pasted in as well. Some of these characters render fine, but depending on the source, some might not display correctly (usually showing up as a question mark with a box around it).
So this is actually 2 questions:
Given a special character (e.g., ’ or ♥) can I determine the UTF-8 character code used to display that character from inside my text editor, and/or convert those characters to their character codes?
For those "extra-special" characters that come in as garbage, is there any way to figure out what encoding was used to display that character in the source text, and can those characters somehow be converted to UTF-8?

My favorite site for looking up characters is fileformat.info. They have a great Unicode character search that includes a lot of useful information about each character and its various encodings.
If you see the question mark with a box, that means you pasted something that can't be interpreted, often because it's not legal UTF-8 (not every byte sequence is legal UTF-8). One possibility is that it's UTF-16 with an endian mode that your editor isn't expecting. If you can get the full original source into a file, the file command is often the best tool for determining the encoding.

At &what I built a tool to focus on searching for characters. It indexes all the Unicode and HTML entity tables, but also supplements with hacker dictionaries and a database of keywords I've collected, so you can search for words like heart, quot, weather, umlaut, hash, cloverleaf and get what you want. By focusing on search, it avoids having to hunt around the Unicode pages, which can be frustrating. Give it a try.

find non LaTeX characters (eg. acute accents) with regex in vim

I was pasting bibtex references into bibex. some names contain characters that latex skips. for example, á. is there a way in vim or regex to search for all characters that are skipped by latex? one way I would think is to write in regex to search for anything that doesn't contain 0-9, a-z, A-Z and some characters like / \ $

I am not familiar with which characters LaTeX ignores, but if the file you are editing is encoded in UTF-8, you might try searching for characters outside the ASCII repertoire (0–127; or 32–127).
As a search command in Vim:
/[^\d0-\d127]
/[^\d32-\d127]
You can also use hex or octal instead of decimal; see :help /[]. This requires that l and \ not be present in the value of cpoptions (they are not present in the default state).
This should work for any encoding that is “the same as ASCII (where it is defined)” (i.e. UTF-8 and most “latin” encodings). If you are dealing with an encoding that clashes with ASCII, then you will need to refine the range specification.

How to have a carriage return without bringing about a linebreak in VIM?

Is it possible to have a carriage return without bringing about a linebreak ?
For instance I want to write the following sentences in 2 lines and not 4 (and I do not want to type spaces of course) :
On a ship at sea: a tempestuous noise of thunder and lightning heard.
Enter a Master and a Boatswain
Master : Boatswain!
Boatswain : Here, master: what cheer?
Thanks in advance for your help
Thierry

In a text file, the expected line-end character or character sequence is platform dependent. On Windows, the sequence "carriage return (CR, \r) + line feed (LF, \n)" is used, while Unix systems use newline only (LF \n). Macintoshes traditionally used \r only, but these days on OS X I see them dealing with just about any version. Text editors on any system are often able to support all three versions, and to convert between them.
For VIM, see this article for tips how to convert/set line end character sequences.
However, I'm not exactly sure what advantage the change would have for you: Whichever sequence or character you use, it is just the marker for the end of the line (so there should be one of these, at the end of the first line and you'd have a 2 line text file in any event). However, if your application expects a certain character, you can either change the application -- many programming languages support some form of "universal" newline -- or change the data.

Just in case this is what you're looking for:
:set wrap
:set linebreak
The first tells vim to wrap long lines, and the second tells it to only break lines at word breaks, instead of in the middle of words when it reaches the window size.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to disable space when using pandoc with Chinese characters? - vim

Related

Vim treats unicode characters as a separate text object from english alphabets

Vim not detecting implicit newline characters instead of visible newline characters I am trying to strip

How can I find the character code of a special character in my text editor?

find non LaTeX characters (eg. acute accents) with regex in vim

How to have a carriage return without bringing about a linebreak in VIM?

Categories

Resources