Browsers, when resized, word-wrap text on the fly, right?
What characters beside normal spaces, allow to be "breaked" down?
I know soft hyphens and zero with spaces also do this. But what others?
e.g.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
When resized:
Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat. Duis aute irure
dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit
anim id est laborum.
The following is from the Line Breaking and Word Boundaries section in the latest W3C CSS3 Draft: http://www.w3.org/TR/css3-text/#line-breaking
In most writing systems, in the absence of hyphenation a line break occurs only at word boundaries. Many writing systems use spaces or punctuation to explicitly separate words, and line break opportunities can be identified by these characters. Scripts such as Thai, Lao, and Khmer, however, do not use spaces or punctuation to separate words. Although the zero width space (U+200B) can be used as an explicit word delimiter in these scripts, this practice is not common. As a result, a lexical resource is needed to correctly identify break points in such texts.
In several other writing systems, (including Chinese, Japanese, Yi, and sometimes also Korean) a line break opportunity is based on character boundaries, not word boundaries. In these systems a line can break anywhere except between certain character combinations. Additionally the level of strictness in these restrictions can vary with the typesetting style.
Related
This question already has answers here:
Bash: Parse CSV with quotes, commas and newlines
(10 answers)
Closed 2 years ago.
I'm trying to extract a multi line text from a text file where values are separated by delimiters and save it into a string or an array. Most of the values are extracted and saved to a variable by awk but the problem occurs when I need to extract a multi line description of a specific product into a variable/array.
The simplified input file syntax looks like this:
ID;Name;value1;value2;DESCRIPTION;valueX;valueY;
I'm extracting the first values with awk -F ";" '{print $1}' assigning them to variables fro future manipulation and it works fine but the problem occurs at the "DESCRIPTION" part since its multi line with HTML tags. An example of how the DESCRIPTION looks like:
value2;"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>";valueX;valueY
Can you suggest a way of getting this done the way so I can assign the DESCRIPTION in to some kind of variable or an array within the bash script and manipulate it further on?
You (originally) asked for an awk-based solution. As others mentioned in the comments there are better tools for the job. That said, based on 4.9 Multiple-Line Records and 4.7 Defining Fields by Content you can try something like:
$ awk --version
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
[...]
$ awk 'BEGIN {RS = ";\n"; FPAT = "([^;]+)|(\"<p.+p>\")" } { print "NF = ", NF; for (i = 1; i <= NF; i++) { printf("$%d = %s\n", i, $i) } }' testfile
RS = ";\n" is here assuming that your input file has multiple ID;Name;value1;value2;DESCRIPTION;valueX;valueY; records and that the records are separated with a ; (this is the ; after valueY in your example) followed by a newline.
FPAT = "([^;]+)|(\"<p.+p>\")" is a "best-effort" approach to tell (g)awk how the fields of your records look like. You may need to modify it according to your needs. What is actually says is that there are two field formats (see (...)|(...)). The first field format captures strings that do not contain ; and is used to capture all the fields except DESCRIPTION. The second field format captures strings that start with "< and end with >".
Against a file with 2 ID;Name;value1;value2;DESCRIPTION;valueX;valueY;:
$ cat testfile
ID;Name;value1;value2;"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>";valueX;valueY;
ID;Name;value1;value2;"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>";valueX;valueY;
$ awk 'BEGIN {RS = ";\n"; FPAT = "([^;]+)|(\"<p.+p>\")" } { print "NF = ", NF; for (i = 1; i <= NF; i++) { printf("$%d = %s\n", i, $i) } }' testfile
NF = 7
$1 = ID
$2 = Name
$3 = value1
$4 = value2
$5 = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>"
$6 = valueX
$7 = valueY
NF = 7
$1 = ID
$2 = Name
$3 = value1
$4 = value2
$5 = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>"
$6 = valueX
$7 = valueY
According to the YAML syntax the > symbol should allow the value to span multiple lines:
- name: coming-soon
teaser: >
“Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.”
But it is resulting the following error in Jekyll:
could not find expected ':' while scanning a simple key at line 11 column 3
The : is clearly there so I don't know what's causing it to fail. Has anyone come across this before?
I've tried putting all the lines in double quote and single quotes. I've tried removing the quotes altogether. I've tried using >- instead of > but all of them produce the same error.
What is introduced by the > is a folded style block scalar, as indicated in the spec, it is similar to the literal style scalar so you can rewrite its description to match the folded style:
Inside folded scalars, all (indented) characters are considered to be content, including white space characters. Note that all line break characters are normalized.
What is clearly missing is the indentation, which determines what lines belong to this value for the key teaser. If there were a following key, it would have to be the first thing aligned with teaser again. But your whole folded scalar is aligned, and that confuses the YAML parser.
I am not sure if you want the double quotes to be part of the value, if you do you should use:
- name: coming-soon
teaser: >
“Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.”
(the amount of spaces is not important, but make sure every line is aligned, otherwise you have to specify the indent after the >).
If your double quotes are not part of the value, you can use the folded scalar:
- name: coming-soon
teaser: >
Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.
Or leave out the folding and use a multi-line plain scalar:
- name: coming-soon
teaser: Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.
When it doubt...
Indent!
- name: coming-soon
teaser: >
Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.
In Vim I try to delete all lines in a file not (containing a pattern and 2 lines above the pattern). I try:
:g!/pattern/.-2 d
But it says: invalid range...
What to do?
The command below looks for lines that don't match pattern and deletes them and the two lines above:
:g!/pattern/-2,.d
The command below looks for lines that don't match pattern and deletes the line located two lines above:
:g!/pattern/-2d
Ranges always go downwards so we use the upper address first — -2 — and the lower one second — . —.
That said, you'll most likely get an error if a matching line doesn't have two lines above it.
then how should i delete all the lines exept the lines 4, 5 and 6 in the folowing file: line 1 line 2 line 3 line 4 line 5 line containing pattern line 7 ?
Like this:
:v/\v(.*\n){,2}.*pattern.*/d
This matches if:
the line contains the pattern, or
next line contains the pattern, or
the 2nd next line contains the pattern.
These lines are kept. All other lines (:v) are deleted.
Example:
"Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim
id est laborum."
Run:
:v/\v(.*\n){,2}.*labor.*/d
Result:
consectetur adipiscing elit, -2
sed do eiusmod tempor incididunt -1
ut labore et dolore magna aliqua. <-0 labor(e)
Ut enim ad minim veniam, quis nostrud -1
exercitation ullamco laboris nisi ut <-0 labor(is)
occaecat cupidatat non proident, sunt in -2
culpa qui officia deserunt mollit anim -1
id est laborum." <-0 labor(um)
What the fastest way to turn this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa
qui officia deserunt mollit anim id est laborum.
Into this:
> Lorem ipsum dolor sit amet, consectetur adipiscing elit,
>
> sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
> ex ea commodo consequat. Duis aute irure dolor in reprehenderit
>
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
> sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
> mollit anim id est laborum.
With Gvim?
Go to the beginning of the first line and press ctrl+v to enter in visual block.
Scroll down until the last line and then press shift+i.
Now type the text you want and then press esc.
This should do the job. :)
One solution as already mentioned is search and replace but it forces you to think in command line mode io normal mode.
The most natural way for me is to think about how I would change one line in normal mode (I>) and apply that to the whole file by using :%norm
This would be
:%norm I>
Perhaps not the shortest possible sequence but it doesn't interrupt my train of thought to much.
I don't know about gvim but in vim you could mark the top and bottom lines, let's say with a and b respectively, and then execute the following command:
:'a,'bs/^/> /gc
Well,you no necessary to use the visual mode to accomplish this task. Using the global replace command is super easy.
try execute this command and press enter:
:%s/^/>/g
Am a long time kate user switching to vim.
Wonder whether vim has an easily activable option (or for it has been coded a plugin) to 'smartly' apply static word wrap to large strings when coding major languages: C/C++, Java, Python, PHP, (more follow).
Not only while writing but also while applying an indentation modification to a visual text block, or (un)?commenting it. Let us have a pseudo-Java situation like:
1 String loremIpsum = "Lorem ipsum dolor sit amet, consectetur adipi" +
2 "sicing elit, sed do eiusmod tempor incididunt ut " +
3 "labore et dolore magna aliqua. Ut enim ad minim v" +
4 "eniam, quis nostrud exercitation ullamco laboris " +
5 "nisi ut aliquip ex ea commodo consequat. Duis aut" +
6 "e irure dolor in reprehenderit in voluptate velit" +
7 " esse cillum dolore eu fugiat nulla pariatur. Exc" +
8 "epteur sint occaecat cupidatat non proident, sunt" +
9 " in culpa qui officia deserunt mollit anim id est" +
10 " laborum.";
~
At some point would want to add or remove some indentation levels, but relying in the editor to rebuild the whole language provisioned string with our static word wrap rules. Suppose now by some reason it is desirable to remove two spaces of indentation, the desired output would be:
1 String loremIpsum = "Lorem ipsum dolor sit amet, consectetur adipisi" +
2 "cing elit, sed do eiusmod tempor incididunt ut labo" +
3 "re et dolore magna aliqua. Ut enim ad minim veniam," +
4 " quis nostrud exercitation ullamco laboris nisi ut " +
5 "aliquip ex ea commodo consequat. Duis aute irure do" +
6 "lor in reprehenderit in voluptate velit esse cillum" +
7 " dolore eu fugiat nulla pariatur. Excepteur sint oc" +
8 "caeact cupidatat non proident, sunt in culpa qui of" +
9 "ficia deserunt mollit anim id est laborum.";
~
Which is the tool for this to be constructed by vim?
With Vim, the gq command reformats lines; this can even be done as-you-type with :set formatoptions+=a.
Unfortunately, Vim's built-in capabilities are limited to basic stuff (see :help fo-table); elaborate and language-specific formatters are meant to be provided by external programs ('formatprg'), or Vimscript ('formatexpr'), the latter one I haven't actually seen used yet.
So, if you're lucky you'll find an external code formatter program that can be integrated, or you'll have to write such a thing yourself.