YAML multi-line signifier not working in Jekyll data file - string

According to the YAML syntax the > symbol should allow the value to span multiple lines:
- name: coming-soon
teaser: >
“Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.”
But it is resulting the following error in Jekyll:
could not find expected ':' while scanning a simple key at line 11 column 3
The : is clearly there so I don't know what's causing it to fail. Has anyone come across this before?
I've tried putting all the lines in double quote and single quotes. I've tried removing the quotes altogether. I've tried using >- instead of > but all of them produce the same error.

What is introduced by the > is a folded style block scalar, as indicated in the spec, it is similar to the literal style scalar so you can rewrite its description to match the folded style:
Inside folded scalars, all (indented) characters are considered to be content, including white space characters. Note that all line break characters are normalized.
What is clearly missing is the indentation, which determines what lines belong to this value for the key teaser. If there were a following key, it would have to be the first thing aligned with teaser again. But your whole folded scalar is aligned, and that confuses the YAML parser.
I am not sure if you want the double quotes to be part of the value, if you do you should use:
- name: coming-soon
teaser: >
“Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.”
(the amount of spaces is not important, but make sure every line is aligned, otherwise you have to specify the indent after the >).
If your double quotes are not part of the value, you can use the folded scalar:
- name: coming-soon
teaser: >
Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.
Or leave out the folding and use a multi-line plain scalar:
- name: coming-soon
teaser: Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.

When it doubt...
Indent!
- name: coming-soon
teaser: >
Dolor eiusmod cupidatat duis qui consectetur.
Mollit nulla consectetur id excepteur do.
Anim ut dolor quis sit consequat.
Mollit laboris proident sunt incididunt mollit consequat officia.
Ad deserunt eu veniam qui eiusmod ex proident
pariatur tempor mollit laborum enim laboris elit.

Related

Bash replace string with another in determined pattern

I have a file that contains this string requires="ua-common/2.7.25#aep/stable" (the version number is variable) and I need to replace it with requires="ua-common/2.7.26#aep/stable" (the version number is variable). So I need to change the version number without knowing what its value is.
I need to make a little script like so:
#!/bin/bash
echo "Insert version number: "
read version
#replace $version with old version in file
Thank you
This is how I understood your question.
The file file.txt
fjlakjflajkflkajfjakjfalkjfoairujnasncv
O
aljflajflja ljfaljflakjflakjf
ia;jflajfjaljfajflajfoiuqoruaf
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
requires="ua-common/2.7.25#aep-stable"
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
jalfjalkfjaoeiurjasnvafnaojf
jvjvg]
iajfiaufurva ajfaj
The script myscript
#!/usr/bin/env bash
current_version=$(grep -oP 'requires="ua-common\/\K.*?(?=#)' file.txt)
printf 'The current version is: %s\n\n' "$current_version"
read -rp "Insert a new version number: " version
if [[ -n $version ]]; then
sed "s|\(requires=.*ua-common/\).*\(#.*\)\$|\1$version\2|" file.txt
fi
Then run
bash ./myscript
Output
The current version is: 2.7.25
Insert a new version number:
Key in the new version number:
The current version is: 2.7.25
Insert a new version number: 2.7.26
Output
fjlakjflajkflkajfjakjfalkjfoairujnasncv
O
aljflajflja ljfaljflakjflakjf
ia;jflajfjaljfajflajfoiuqoruaf
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
requires="ua-common/2.7.26#aep-stable"
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
jalfjalkfjaoeiurjasnvafnaojf
jvjvg]
iajfiaufurva ajfaj
There are some more to do with the current script like, error checking , exit when something went wrong and so on, but that is a start.
1st solution: With your shown samples(this will work with any version shown in samples format), please try following awk code. This will give output as ua-common/2.7.26#aep/stable with shown samples.
echo "$requires" |
awk '
match($0,/([0-9]+\.)*[0-9]+/){
value=substr($0,RSTART,RLENGTH)
num=split(value,arr,".")
arr[num]+=1
value=""
for(i=1;i<=num;i++){
value=(value?value ".":"")arr[i]
}
print substr($0,1,RSTART-1) value substr($0,RSTART+RLENGTH)
}
'
2nd solution: If your input is always same and you want to increase digit just before # then try following.
echo "$requires" |
awk '
match($0,/[0-9]+#/){
print substr($0,1,RSTART-1) substr($0,RSTART+1,RLENGTH-1)+1 substr($0,RSTART+RLENGTH-1)
}
'
The most obvious solution since you ask for bash (doesn't work with dash which is mostly linked to /bin/sh)
#!/bin/bash
oldVersion=2.7.25
newVersion=2.7.26
requires="ua-common/2.7.25#aep/stable"
echo "${requires//$oldVersion/$newVersion}"
and the output:
ua-common/2.7.26#aep/stable
Reference page: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
You might want to use sed for that job. Depending of the file contents you need to adjust the sed searchpattern:
#: cat test.file
requires="ua-common/2.7.25#aep/stable"
#: sed -i 's/^requires=\"ua-common\/[0-9]*.[0-9]*.[0-9]*.#/requires=\"ua-common\/1.2.3#/g' test.file
#: cat test.file
requires="ua-common/1.2.3#aep/stable"

Extracting multi line text from a file between delimiters in bash [duplicate]

This question already has answers here:
Bash: Parse CSV with quotes, commas and newlines
(10 answers)
Closed 2 years ago.
I'm trying to extract a multi line text from a text file where values are separated by delimiters and save it into a string or an array. Most of the values are extracted and saved to a variable by awk but the problem occurs when I need to extract a multi line description of a specific product into a variable/array.
The simplified input file syntax looks like this:
ID;Name;value1;value2;DESCRIPTION;valueX;valueY;
I'm extracting the first values with awk -F ";" '{print $1}' assigning them to variables fro future manipulation and it works fine but the problem occurs at the "DESCRIPTION" part since its multi line with HTML tags. An example of how the DESCRIPTION looks like:
value2;"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>";valueX;valueY
Can you suggest a way of getting this done the way so I can assign the DESCRIPTION in to some kind of variable or an array within the bash script and manipulate it further on?
You (originally) asked for an awk-based solution. As others mentioned in the comments there are better tools for the job. That said, based on 4.9 Multiple-Line Records and 4.7 Defining Fields by Content you can try something like:
$ awk --version
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
[...]
$ awk 'BEGIN {RS = ";\n"; FPAT = "([^;]+)|(\"<p.+p>\")" } { print "NF = ", NF; for (i = 1; i <= NF; i++) { printf("$%d = %s\n", i, $i) } }' testfile
RS = ";\n" is here assuming that your input file has multiple ID;Name;value1;value2;DESCRIPTION;valueX;valueY; records and that the records are separated with a ; (this is the ; after valueY in your example) followed by a newline.
FPAT = "([^;]+)|(\"<p.+p>\")" is a "best-effort" approach to tell (g)awk how the fields of your records look like. You may need to modify it according to your needs. What is actually says is that there are two field formats (see (...)|(...)). The first field format captures strings that do not contain ; and is used to capture all the fields except DESCRIPTION. The second field format captures strings that start with "< and end with >".
Against a file with 2 ID;Name;value1;value2;DESCRIPTION;valueX;valueY;:
$ cat testfile
ID;Name;value1;value2;"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>";valueX;valueY;
ID;Name;value1;value2;"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>";valueX;valueY;
$ awk 'BEGIN {RS = ";\n"; FPAT = "([^;]+)|(\"<p.+p>\")" } { print "NF = ", NF; for (i = 1; i <= NF; i++) { printf("$%d = %s\n", i, $i) } }' testfile
NF = 7
$1 = ID
$2 = Name
$3 = value1
$4 = value2
$5 = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>"
$6 = valueX
$7 = valueY
NF = 7
$1 = ID
$2 = Name
$3 = value1
$4 = value2
$5 = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<strong>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</strong>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<p style=""text-align: center;"">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>"
$6 = valueX
$7 = valueY

delete lines not (containing pattern and 2 lines above pattern)

In Vim I try to delete all lines in a file not (containing a pattern and 2 lines above the pattern). I try:
:g!/pattern/.-2 d
But it says: invalid range...
What to do?
The command below looks for lines that don't match pattern and deletes them and the two lines above:
:g!/pattern/-2,.d
The command below looks for lines that don't match pattern and deletes the line located two lines above:
:g!/pattern/-2d
Ranges always go downwards so we use the upper address first — -2 — and the lower one second — . —.
That said, you'll most likely get an error if a matching line doesn't have two lines above it.
then how should i delete all the lines exept the lines 4, 5 and 6 in the folowing file: line 1 line 2 line 3 line 4 line 5 line containing pattern line 7 ?
Like this:
:v/\v(.*\n){,2}.*pattern.*/d
This matches if:
the line contains the pattern, or
next line contains the pattern, or
the 2nd next line contains the pattern.
These lines are kept. All other lines (:v) are deleted.
Example:
"Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim
id est laborum."
Run:
:v/\v(.*\n){,2}.*labor.*/d
Result:
consectetur adipiscing elit, -2
sed do eiusmod tempor incididunt -1
ut labore et dolore magna aliqua. <-0 labor(e)
Ut enim ad minim veniam, quis nostrud -1
exercitation ullamco laboris nisi ut <-0 labor(is)
occaecat cupidatat non proident, sunt in -2
culpa qui officia deserunt mollit anim -1
id est laborum." <-0 labor(um)

Adding a character at the beginning of each line in Gvim?

What the fastest way to turn this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa
qui officia deserunt mollit anim id est laborum.
Into this:
> Lorem ipsum dolor sit amet, consectetur adipiscing elit,
>
> sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
> ex ea commodo consequat. Duis aute irure dolor in reprehenderit
>
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
> sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
> mollit anim id est laborum.
With Gvim?
Go to the beginning of the first line and press ctrl+v to enter in visual block.
Scroll down until the last line and then press shift+i.
Now type the text you want and then press esc.
This should do the job. :)
One solution as already mentioned is search and replace but it forces you to think in command line mode io normal mode.
The most natural way for me is to think about how I would change one line in normal mode (I>) and apply that to the whole file by using :%norm
This would be
:%norm I>
Perhaps not the shortest possible sequence but it doesn't interrupt my train of thought to much.
I don't know about gvim but in vim you could mark the top and bottom lines, let's say with a and b respectively, and then execute the following command:
:'a,'bs/^/> /gc
Well,you no necessary to use the visual mode to accomplish this task. Using the global replace command is super easy.
try execute this command and press enter:
:%s/^/>/g

Characters prone to word-wrapping

Browsers, when resized, word-wrap text on the fly, right?
What characters beside normal spaces, allow to be "breaked" down?
I know soft hyphens and zero with spaces also do this. But what others?
e.g.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
When resized:
Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat. Duis aute irure
dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit
anim id est laborum.
The following is from the Line Breaking and Word Boundaries section in the latest W3C CSS3 Draft: http://www.w3.org/TR/css3-text/#line-breaking
In most writing systems, in the absence of hyphenation a line break occurs only at word boundaries. Many writing systems use spaces or punctuation to explicitly separate words, and line break opportunities can be identified by these characters. Scripts such as Thai, Lao, and Khmer, however, do not use spaces or punctuation to separate words. Although the zero width space (U+200B) can be used as an explicit word delimiter in these scripts, this practice is not common. As a result, a lexical resource is needed to correctly identify break points in such texts.
In several other writing systems, (including Chinese, Japanese, Yi, and sometimes also Korean) a line break opportunity is based on character boundaries, not word boundaries. In these systems a line can break anywhere except between certain character combinations. Additionally the level of strictness in these restrictions can vary with the typesetting style.

Resources