How do I break a string in YAML over multiple lines? - string

In YAML, I have a string that's very long. I want to keep this within the 80-column (or so) view of my editor, so I'd like to break the string. What's the syntax for this?
In other words, I have this:
Key: 'this is my very very very very very very long string'
and I'd like to have this (or something to this effect):
Key: 'this is my very very very ' +
'long string'
I'd like to use quotes as above, so I don't need to escape anything within the string.

There are 5 6 NINE (or 63*, depending how you count) different ways to write multi-line strings in YAML.
TL;DR
Use > most of the time: interior line breaks are stripped out, although you get one at the end:
key: >
Your long
string here.
Use | if you want those linebreaks to be preserved as \n (for instance, embedded markdown with paragraphs).
key: |
### Heading
* Bullet
* Points
Use >- or |- instead if you don't want a linebreak appended at the end.
Use "..." if you need to split lines in the middle of words or want to literally type linebreaks as \n:
key: "Antidisestab\
lishmentarianism.\n\nGet on it."
YAML is crazy.
Block scalar styles (>, |)
These allow characters such as \ and " without escaping, and add a new line (\n) to the end of your string.
> Folded style removes single newlines within the string (but adds one at the end, and converts double newlines to singles):
Key: >
this is my very very very
long string
→ this is my very very very long string\n
Extra leading space is retained and causes extra newlines. See note below.
Advice: Use this. Usually this is what you want.
| Literal style
turns every newline within the string into a literal newline, and adds one at the end:
Key: |
this is my very very very
long string
→ this is my very very very\nlong string\n
Here's the official definition from the YAML Spec 1.2
Scalar content can be written in block notation, using a literal style (indicated by “|”) where all line breaks are significant. Alternatively, they can be written with the folded style (denoted by “>”) where each line break is folded to a space unless it ends an empty or a more-indented line.
Advice: Use this for inserting formatted text (especially Markdown) as a value.
Block styles with block chomping indicator (>-, |-, >+, |+)
You can control the handling of the final new line in the string, and any trailing blank lines (\n\n) by adding a block chomping indicator character:
>, |: "clip": keep the line feed, remove the trailing blank lines.
>-, |-: "strip": remove the line feed, remove the trailing blank lines.
>+, |+: "keep": keep the line feed, keep trailing blank lines.
"Flow" scalar styles ( , ", ')
These have limited escaping, and construct a single-line string with no new line characters. They can begin on the same line as the key, or with additional newlines first, which are stripped. Doubled newline characters become one newline.
plain style (no escaping, no # or : combinations, first character can't be ", ' or many other punctuation characters ):
Key: this is my very very very
long string
Advice: Avoid. May look convenient, but you're liable to shoot yourself in the foot by accidentally using forbidden punctuation and triggering a syntax error.
double-quoted style (\ and " must be escaped by \, newlines can be inserted with a literal \n sequence, lines can be concatenated without spaces with trailing \):
Key: "this is my very very \"very\" loooo\
ng string.\n\nLove, YAML."
→ "this is my very very \"very\" loooong string.\n\nLove, YAML."
Advice: Use in very specific situations. This is the only way you can break a very long token (like a URL) across lines without adding spaces. And maybe adding newlines mid-line is conceivably useful.
single-quoted style (literal ' must be doubled, no special characters, possibly useful for expressing strings starting with double quotes):
Key: 'this is my very very "very"
long string, isn''t it.'
→ "this is my very very \"very\" long string, isn't it."
Advice: Avoid. Very few benefits, mostly inconvenience.
Block styles with indentation indicators
Just in case the above isn't enough for you, you can add a "block indentation indicator" (after your block chomping indicator, if you have one):
- >8
My long string
starts over here
- |+1
This one
starts here
Note: Leading spaces in Folded style (>)
If you insert extra spaces at the start of not-the-first lines in Folded style, they will be kept, with a bonus newline. (This doesn't happen with flow styles.) Section 6.5:
In addition, folding does not apply to line breaks surrounding text lines that contain leading white space. Note that such a more-indented line may consist only of such leading white space.
- >
my long
string
many spaces above
- my long
string
many spaces above
→ ["my long\n string\n \nmany spaces above\n","my long string\nmany spaces above"]
Summary
In this table, _ means space character. \n means "newline character" (\n in JavaScript) except under "Other features". "Leading space" applies after the first line (which establishes the indent)
>
|
"
'
>-
>+
|-
|+
Spaces/newlines converted as:
Trailing space →
_
_
_
_
_
_
Leading space →
\n_
\n_
\n_
\n_
\n_
\n_
Single newline →
_
\n
_
_
_
_
_
\n
\n
Double newline →
\n
\n\n
\n
\n
\n
\n
\n
\n\n
\n\n
Final newline →
\n
\n
\n
\n
Final double newline →
\n\n
\n\n
How to create a literal:
Single quote
'
'
'
'
''
'
'
'
'
Double quote
"
"
"
\"
"
"
"
"
"
Backslash
\
\
\
\\
\
\
\
\
\
Other features
In-line newlines with \n
🚫
🚫
🚫
✅
🚫
🚫
🚫
🚫
🚫
Spaceless newlines with \
🚫
🚫
🚫
✅
🚫
🚫
🚫
🚫
🚫
# or : in value
✅
✅
🚫
✅
✅
✅
✅
✅
✅
Can start on sameline as key
🚫
🚫
✅
✅
✅
🚫
🚫
🚫
🚫
Examples
Note the trailing spaces on the line before "spaces."
- >
very "long"
'string' with
paragraph gap, \n and
spaces.
- |
very "long"
'string' with
paragraph gap, \n and
spaces.
- very "long"
'string' with
paragraph gap, \n and
spaces.
- "very \"long\"
'string' with
paragraph gap, \n and
s\
p\
a\
c\
e\
s."
- 'very "long"
''string'' with
paragraph gap, \n and
spaces.'
- >-
very "long"
'string' with
paragraph gap, \n and
spaces.
[
"very \"long\" 'string' with\nparagraph gap, \\n and spaces.\n",
"very \"long\"\n'string' with\n\nparagraph gap, \\n and \nspaces.\n",
"very \"long\" 'string' with\nparagraph gap, \\n and spaces.",
"very \"long\" 'string' with\nparagraph gap, \n and spaces.",
"very \"long\" 'string' with\nparagraph gap, \\n and spaces.",
"very \"long\" 'string' with\nparagraph gap, \\n and spaces."
]
*2 block styles, each with 2 possible block chomping indicators (or none), and with 9 possible indentation indicators (or none), 1 plain style and 2 quoted styles: 2 x (2 + 1) x (9 + 1) + 1 + 2 = 63
Some of this information has also been summarised here.

Using yaml folded style. The indention in each line will be ignored. A line break will be inserted at the end.
Key: >
This is a very long sentence
that spans several lines in the YAML
but which will be rendered as a string
with only a single carriage return appended to the end.
http://symfony.com/doc/current/components/yaml/yaml_format.html
You can use the "block chomping indicator" to eliminate the trailing line break, as follows:
Key: >-
This is a very long sentence
that spans several lines in the YAML
but which will be rendered as a string
with NO carriage returns.
In either case, each line break is replaced by a space.
There are other control tools available as well (for controlling indentation for example).
See https://yaml-multiline.info/

To preserve newlines use |, for example:
|
This is a very long sentence
that spans several lines in the YAML
but which will be rendered as a string
with newlines preserved.
is translated to "This is a very long sentence‌\n that spans several lines in the YAML‌\n but which will be rendered as a string‌\n with newlines preserved.\n"

1. Block Notation(plain, flow-style, scalar): Newlines become spaces and extra newlines after the block are removed
---
# Note: It has 1 new line after the string
content:
Arbitrary free text
over multiple lines stopping
after indentation changes...
...
Equivalent JSON
{
"content": "Arbitrary free text over multiple lines stopping after indentation changes..."
}
2. Literal Block Scalar: A Literal Block Scalar | will include the newlines and any trailing spaces. but removes extra
newlines after the block.
---
# After string we have 2 spaces and 2 new lines
content1: |
Arbitrary free text
over "multiple lines" stopping
after indentation changes...
...
Equivalent JSON
{
"content1": "Arbitrary free text\nover \"multiple lines\" stopping\nafter indentation changes... \n"
}
3. + indicator with Literal Block Scalar: keep extra newlines after block
---
# After string we have 2 new lines
plain: |+
This unquoted scalar
spans many lines.
...
Equivalent JSON
{
"plain": "This unquoted scalar\nspans many lines.\n\n\n"
}
4. – indicator with Literal Block Scalar: – means that the newline at the end of the string is removed.
---
# After string we have 2 new lines
plain: |-
This unquoted scalar
spans many lines.
...
Equivalent JSON
{
"plain": "This unquoted scalar\nspans many lines."
}
5. Folded Block Scalar(>):
will fold newlines to spaces and but removes extra newlines after the block.
---
folded_newlines: >
this is really a
single line of text
despite appearances
...
Equivalent JSON
{
"fold_newlines": "this is really a single line of text despite appearances\n"
}
for more you can visit my Blog

To concatenate long lines without whitespace, use double quotes and escape the newlines with backslashes:
key: "Loremipsumdolorsitamet,consecteturadipiscingelit,seddoeiusmodtemp\
orincididuntutlaboreetdoloremagnaaliqua."

You might not believe it, but YAML can do multi-line keys too:
?
>
multi
line
key
:
value

In case you're using YAML and Twig for translations in Symfony, and want to use multi-line translations in Javascript, a carriage return is added right after the translation. So even the following code:
var javascriptVariable = "{{- 'key'|trans -}}";
Which has the following yml translation:
key: >
This is a
multi line
translation.
Will still result into the following code in html:
var javascriptVariable = "This is a multi line translation.
";
So, the minus sign in Twig does not solve this. The solution is to add this minus sign after the greater than sign in yml:
key: >-
This is a
multi line
translation.
Will have the proper result, multi line translation on one line in Twig:
var javascriptVariable = "This is a multi line translation.";

For situations were the string might contain spaces or not, I prefer double quotes and line continuation with backslashes:
key: "String \
with long c\
ontent"
But note about the pitfall for the case that a continuation line begins with a space, it needs to be escaped (because it will be stripped away elsewhere):
key: "String\
\ with lon\
g content"
If the string contains line breaks, this needs to be written in C style \n.
See also this question.

None of the above solutions worked for me, in a YAML file within a Jekyll project. After trying many options, I realized that an HTML injection with <br> might do as well, since in the end everything is rendered to HTML:
name: |
In a village of La Mancha <br> whose name I don't <br> want to remember.
At least it works for me. No idea on the problems associated to this approach.

Related

Using Pest.rs how can I manage a multi-line syntax where a line ends in "\"?

A common idiom for bash is is to use \ to escape the newline at the end of the line,
If a \<newline> pair appears, and the backslash is not itself quoted, the \<newline> is treated as a line continuation (that is, it is removed from the input stream and effectively ignored).
Such that
FOO \
BAR
is the same as,
FOO BAR
How would I write this grammar into pest.rs? Note this means that NEWLINE is significant in my grammar, and I can't merely ignore it.
One method is to set your
WHITESPACE = { ( " "* ~ "\\" ~ NEWLINE ~ " "* ) }
This keeps regular newlines significant unless they're prefixed by \.

how do you count and replace a string in a text file that starts at the end of one line and continues on the next using linux commands?

I have a large (4 GB) Windows .csv text file (each lines end in "\r\n") in a Linux environment that was supposed to have been a csv delimited file (delimiter = '|', text qualifier = '"') with each field separated by a pipe and enclosed in double quotes. Any narrative text field with embedded double quotes was supposed to have the double quote escaped with a second double quote (ie. " the quick "brown" fox" was supposed to have been represented as "the quick ""brown"" fox"). Unfortunately escaping the embedded double quotes did not occur. Further the text fields may include embedded new lines (i.e. Windows CR (\r\n)) which need to be retained.
Sample lines might look as follows:
"1234567890123456"|"2016-07-30"|"2016-08-01"|"123"|"456"|"789"|"text narrative field starts\r\n
with text lines that may have embedded double quotes "For example"\r\n
and may include measurements such as 1/2" x 2" with \r\n
the text continuing and includes embedded line breaks \r\n
which will finally be terminated with a double quote"\r\n
"9876543210654321"|"2017-01-31"|"2018-08-01"|"123"|"456"|"789"|"text narrative field"\r\n
"2345678901234567"|"...."\r\n
with the objective to have the output appear as follows:
~1234567890123456~|~2016-07-30~|~2016-08-01~|~123~|~456~|~789~|~text narrative field starts\r\n
with text lines that may have embedded double quotes ""For example""\r\n
and may include measurements such as 1/2"" x 2"" with \r\n
the text continuing and includes embedded line breaks \r\n
which will finally be terminated with a double quote~\r\n
~9876543210654321~|~2017-01-31~|~2018-08-01~|~123~|~456~|~789~|~text narrative field~\r\n
~2345678901234567~|~....~\r\n
The solution I was attempting to implement was to:
SUCCESSFUL: change all the "|" sequences to ~|~
SUCCESSFUL: change the double quote (")at the start of the first line and end of the last line to a tilde (~)
change the ending and starting double quotes to tildes for any lines ending in a double quote at the end of the first line and terminated with a CR (\r\n) (eg. ..."\r\n) and the next line begins with a double quote, followed by 16 digit number and a tilde (eg. "1234567890123456~...) (i.e. it is the start of a new record)
convert all remaining double quote characters to two successive double quotes (change " to "")
then reverse the first 3 steps above changing all ~ back to double quotes.
I started by using sed to replace all strings with double quote, followed by a pipe, followed by a double quote (i.e. "|") with a tilde, pipe, tilde (i.e. ~|~). I then manually replaced the first and last doublequote in the file with a tilde.
This is where I ran into issues as I tried to count the number of occurrences where a line ends with a doublequote(") and the start of the next line begins with a doublequote followed by a 16 digit number and a "~" which will tell me the actual number of csv records in the file (minus one) as opposed to the number of lines. I attempted to do this using grep: grep '"\r\n"\d{16}~' | wc -l but that didn't work
I then need to replace those double quotes wherein a double quote ends a record and the succeeding record begins with a double quote followed by a 16 digit number and a "~" leaving everything else intact.
I tried to use sed: sed 's/"\r\n"(\d{16}~)/~\r\n~\1' windows_file.txt but it is not working as hoped.
I would welcome any recommendations as to how to accomplish the above.
The script below does what you expect using awk, except for the very last line in the file since it does not know where that record ends.
It could be fixed counting lines in the file but would be impractical since it's a big file.
Looking at data structure records are separated by "\r\n" and fields by "|" let's use that with awk.
gawk 'BEGIN{
RS="\"\r\n\"" # input record separator RS, 2 double quotes with a DOS line ending in the middle
FS="\"\\|\"" # input field separator FS, 2 double quotes with a pipe in the middle
ORS="~\r\n~" # your record separator
OFS="~|~" # your field separator
} {
$1=$1 # trick awk into believing something has changed
if (NR == 1){ # first record, replace first character
print "~" substr($0,2)
}else{
print $0
}
} ' test.txt
Result (assuming lines end with \r\n):
~1234567890123456~|~2016-07-30~|~2016-08-01~|~123~|~456~|~789~|~text narrative field starts
with text lines that may have embedded double quotes "For example"
and may include measurements such as 1/2" x 2" with
the text continuing and includes embedded line breaks
which will finally be terminated with a double quote~
~9876543210654321~|~2017-01-31~|~2018-08-01~|~123~|~456~|~789~|~text narrative field~
~10654321~|~2018-09-31~|~2018-08-01~|~123~|~456~|~789~|~asdasdasdasdad asasda"
~
~
PS: will break if a field contains a line that starts with " and the preceding line within the same ends with "\r\n since the pattern will match the proposed RS.
"10654321"|"2018-09-31"|"2018-08-01"|"123"|"456"|"789"|"asdasdasdasdad asasda"\r\n
"some more"\r\n
"22222"|".... (another record)

Create a string in golang with characters that require escaping

I am trying to make a string variable containing :
"C:\Program Files\Sublime Text 3\sublime_text.exe" C:\Users\User\Desktop\Guess.py
Unfortunately I am not succeeding in doing so. Is there a way to put the text about as is into a variable, double quotes and all?
In your example string you have characters that need escaping: " and \
fmt.Println("\"C:\\Program Files\\Sublime Text 3\\sublime_text.exe\" C:\\Users\\User\\Desktop\\Guess.py")
You can also use back quotes to create what is called a raw string which doesn't require escaping those characters.
fmt.Println(`"C:\Program Files\Sublime Text 3\sublime_text.exe" C:\Users\User\Desktop\Guess.py`)
List of escapes:
\a U+0007 alert or bell
\b U+0008 backspace
\f U+000C form feed
\n U+000A line feed or newline
\r U+000D carriage return
\t U+0009 horizontal tab
\v U+000b vertical tab
\\ U+005c backslash
\' U+0027 single quote (valid escape only within rune literals)
\" U+0022 double quote (valid escape only within string literals)
See the official docs.

How to use backslash escape char for new line in JavaCC?

I have an assignment to create a lexical analyser and I've got everything working except for one bit.
I need to create a string that will accept a new line, and the string is delimited by double quotes.
The string accepts any number, letter, some specified punctuation, backslashes and double quotes within the delimiters.
I can't seem to figure out how to escape a new line character.
Is there a certain way of escaping characters like new line and tab?
Here's some of my code that might help
< STRING : ( < QUOTE> (< QUOTE > | < BACKSLASH > | < ID > | < NUM > | " " )* <QUOTE>) >
< #QUOTE : "\"" >
< #BACKSLASH : "\\" >
So my string should allow for a quote, then any of the following characters like a backslash, a whitespace, a number etc, and then followed by another quote.
The newline char like "\n" is what's not working.
Thanks in advance!
For string literals, JavaCC borrows the syntax of Java. So, a single-character literal comprising a carriage return is escaped as "\r", and a single-character literal comprising a line feed is escaped as "\n".
However, the processed string value is just a single character; it is not the escape itself. So, suppose you define a token for line feed:
< LF : "\n" >
A match of the token <LF> will be a single line-feed character. When substituting the token in the definition of another token, the single character is effectively substituted. So, suppose you have the higher-level definition:
< STRING : "\"" ( <LF> ) "\"" >
A match of the token <STRING> will be three characters: a quotation mark, followed by a line feed, followed by a quotation mark. What you seem to want instead is for the escape sequence to be recognized:
< STRING : "\"" ( "\\n" ) "\"" >
Now a match of the token <STRING> will be four characters: a quotation mark, followed by an escape sequence representing a line feed, followed by a quotation mark.
In your current definition, I see that other often-escaped metacharacters like quotation mark and backslash are also being recognized literally, rather than as escape sequences.

How to strip whitespace in string in TCL?

I need to strip leading and trailing whitespace from a string in TCL. How?
Try this -
      string trim string ?chars?
Returns a value equal to string except that any leading or trailing characters from the set given by chars are removed. If chars is not specified then white space is removed (spaces, tabs, newlines, and carriage returns).
Original Source :- http://wiki.tcl.tk/10174
try this. this will remove all the withe spaces
[string map {" " ""} $a];
a is your string

Resources