String literals and escape characters in postgresql - string

Attempting to insert an escape character into a table results in a warning.
For example:
create table EscapeTest (text varchar(50));
insert into EscapeTest (text) values ('This is the first part \n And this is the second');
Produces the warning:
WARNING: nonstandard use of escape in a string literal
(Using PSQL 8.2)
Anyone know how to get around this?

Partially. The text is inserted, but the warning is still generated.
I found a discussion that indicated the text needed to be preceded with 'E', as such:
insert into EscapeTest (text) values (E'This is the first part \n And this is the second');
This suppressed the warning, but the text was still not being returned correctly. When I added the additional slash as Michael suggested, it worked.
As such:
insert into EscapeTest (text) values (E'This is the first part \\n And this is the second');

Cool.
I also found the documentation regarding the E:
http://www.postgresql.org/docs/8.3/interactive/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS
PostgreSQL also accepts "escape" string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote, e.g. E'foo'. (When continuing an escape string constant across lines, write E only before the first opening quote.) Within an escape string, a backslash character (\) begins a C-like backslash escape sequence, in which the combination of backslash and following character(s) represents a special byte value. \b is a backspace, \f is a form feed, \n is a newline, \r is a carriage return, \t is a tab. Also supported are \digits, where digits represents an octal byte value, and \xhexdigits, where hexdigits represents a hexadecimal byte value. (It is your responsibility that the byte sequences you create are valid characters in the server character set encoding.) Any other character following a backslash is taken literally. Thus, to include a backslash character, write two backslashes (\\). Also, a single quote can be included in an escape string by writing \', in addition to the normal way of ''.

The warning is issued since you are using backslashes in your strings. If you want to avoid the message, type this command "set standard_conforming_strings=on;". Then use "E" before your string including backslashes that you want postgresql to intrepret.

I find it highly unlikely for Postgres to truncate your data on input - it either rejects it or stores it as is.
milen#dev:~$ psql
Welcome to psql 8.2.7, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
milen=> create table EscapeTest (text varchar(50));
CREATE TABLE
milen=> insert into EscapeTest (text) values ('This will be inserted \n This will not be');
WARNING: nonstandard use of escape in a string literal
LINE 1: insert into EscapeTest (text) values ('This will be inserted...
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
INSERT 0 1
milen=> select * from EscapeTest;
text
------------------------
This will be inserted
This will not be
(1 row)
milen=>

Really stupid question: Are you sure the string is being truncated, and not just broken at the linebreak you specify (and possibly not showing in your interface)? Ie, do you expect the field to show as
This will be inserted \n This will not
be
or
This will be inserted
This will not be
Also, what interface are you using? Is it possible that something along the way is eating your backslashes?

Related

how to replace string (with special characters) to normal string in vim

Hi i am trying to replace a string with special character at the end with new string. For Example, I want to replace
qwerty_CRS_abc\
to
qwerty_CRS_abc
I tried with this:
:%s/qwerty_CRS_abc\/qwerty_CRS_abc/g
but I'm getting this error:
Pattern not found: padring_CRS_CAN\/padring_CRS_CAN\g
Basically, I just want to remove that backslash in whole file. It should be just
qwerty_CRS_abc
Use:
:%s/qwerty_CRS_abc\\/qwerty_CRS_abc/g
Certain characters such as /&!.^*$\? carry a special significance to the search process and must be escaped using the \ character when they are used in a search. Hence the \\ used to escape the backslash in your example.

Escape dollar sign in dollar-quoted strings query

Hello I am trying to work on a Cassandra query which has an explanation field of data type text. I am using dollar-quoted strings to escape special characters but I face problem when the string of my explanation field ends with a dollar sign
For example
INSERT INTO Users (name, explanation) VALUES ($$Tom$$, $$Some'text$$$);
The last two dollar-quoted strings are the end quotes but the third last is a part of the explanation, how can I escape that? Or is there any other way through which I can escape all special characters including dollar sign?
Thanks in advance
Looking into grammar, it looks like that the lexer is just search until the next occurrence of the $$, and doesn't distinguish between double & triple dollar signs. But if you have any character after $, then it's just handled correctly (for example, string $$dwewdewe'adqdq$'$$ works just fine) - it's only shouldn't be a last character before ending $$. If you want to insert string with ' character inside, you can just escape (see doc) it with another ' character (for example; 'this is string with '' inside' - it works fine, and will produce this is string with ' inside as expected).
This is about inserting the statement one time. But if you're inserting the data from your program, then it's better to use prepared statements instead - they should be supported in all existing drivers. When you're using prepared statements, you don't need to take care about escaping - it's the job of the driver (really, no escaping happens, as string is sent as-is, not as part of the statement). And besides the lack of need for escaping, you also should get better performance, because parsing of the statement happens only once, when it's prepared, and then only statement ID plus parameters are sent to the Cassandra node.

How to split strings sperated by comma with escapes?

I have a string looks like this:
(The whole code block is a string, aka, this string contains quotation marks.)
"he\"llo", "world\n", "fro,m"
[update] Aka, the "actual" string is this:
"\"he\\\"llo\", \"world\\n\", \"fro,m\""
I want to get an array of strings like this:
[ "\"he\\\"llo\"", "\"world\\n\"", "\"fro,m\"" ]
[update] Comma inside quotation marks should be remained.
In my opinion, there are several ways to solve this:
build a automata (DFA or NFA) for this syntax
using several status flags like inQuote, handle judging logics with lots of if else
write a complex but clever Regular Expression for this
Are there any general solutions to this problem? Or how should I actually do using those thinkings above?
P.S. It couldn't be better if some syntax errors like "unclosed quotation mark" can be found.
You need to first define your grammar. This is a simple grammar for your case:
document = *WS [string *WS *(',' *WS string *WS)]
string = %x22 *char %x22
char = %x20-21 / %x23-5B / escape / %x5D-10FFFF
escape = %x5C (%x5C / %x22 / 't' / 'n' / 'r')
WS = %x9 / %x20
You can read it as:
A document may begin/end with a white space, then may have one or more strings separated by commas. Before and after each comma there may be some white space.
A string is made of characters and begins and ends with double quotes Unicode/ASCII hex code 22.
Each character (char), may be: 1) any non control Unicode character before the double quotes i.e. hex 20 (space) or hex 21 (exclamation mark); 2) any character after the double quotes and before the escape slash \ (hex 5C); 3) an escape character sequence; 4) any other Unicode character after the slash (hex 5C).
The escape sequence (rule escape) begins with the escape slash \ and is followed by another slash, or the characters t for tab, n for line feed and r for carriage return. You may add and other escapable characters if you want, as for a C++ string syntax you can see here: https://en.cppreference.com/w/cpp/language/escape .
A white space (WS) is a tab or space, you may add and %xA and %xD for line feed and carriage return respectively.
By the use of this grammar you will get this tree for your input:
The screenshort is from the Tunnel Grammar Studio online laboratory that can run ABNF grammars (as the one above), and I work on.
After you have the grammar, you may use tools to generate a parser, or you can write one yourself. If you want to do it by hand (preferable for so small and simple grammar), you may have one function per each grammar rule that reads one character and checks is it the expected one. If your input finishes when you are parsing the string rule, then you have an input with a started but not finished string.
Your actual string syntax tree will look like that:

How to fine-tune Macros after having recorded it through recording in Vim?

Specific question
Description
After recording the desired action to registrar o, I pasted the whole macro to my ~/.vimrc and assigned it as follows (directly pasting the mappings are not displayed properly)
Expected behavior
I would like to use this macro to get myself a new "comment line" that leads a new section of script, formatted such that the name of the section is centered. After populating the "section title", I would like to enter insert mode in a new line.
In the following screen-record, I have tested both #o and #p$ on the word "time". The second attempt with#p` worked as desired.
The problem (on Windows machine specifically)
As you see, the #o mapping gets me junk phrases which had been part of my definition for the macro. Does this have to do with the ^M operator? And, how can I fix the #o mapping, which uses * to populate the line?
The two mapping worked just fine on Linux system. (Don't know why, as I have recorded and pasted the macro-definition on Windows machine.) This also does not appear to be a problem on Mac with MacVim.
Generalized question
Is there a way to properly substitute the ^M operator (for <CR>, or "Enter"-key)?
Is there a way to properly substitute the ^[ operator (for <ESC>, or the "Escape"-key)?
Is there a systematic list of mappings from these weird representation of keystrokes, as recorded by the "recording" function through q.
Solution
Substitute the ^M marks in the macro-definition with \r. And, substitute ^[ to be \x1b, for the ESC key. The mappings are fixed as follows:
let #o = ":center\ri\r\x1bkV:s/ /\*/g\rJx50A\*\x1b80d|o"
let #p = ":center\ri\r\x1bkV:s/ /\"/g\rJx50A\"\x1b80d|o"
Complete list of key-codes/mappings? Approach 1: through hex code.
Thanks to Zbynek Vyskovsky, the picture is clear. For whatever key one may think of, Vim takes its ASCII value at the "face value". (The trick is to use a escape clause starting with \x, where x serves as the leader key/string/character connecting to the hex values.) Thus, the correspondence list (incomplete yet), goes as follows:
Enter --- \x0d --- \r
ESC --- \x1b --- \e
Solution native to Vim
By chance, :help expr-quote gives the following list of special characters. This shall serve as the definite answer to the original question in general form.
string *string* *String* *expr-string* *E114*
------
"string" string constant *expr-quote*
Note that double quotes are used.
A string constant accepts these special characters:
\... three-digit octal number (e.g., "\316")
\.. two-digit octal number (must be followed by non-digit)
\. one-digit octal number (must be followed by non-digit)
\x.. byte specified with two hex numbers (e.g., "\x1f")
\x. byte specified with one hex number (must be followed by non-hex char)
\X.. same as \x..
\X. same as \x.
\u.... character specified with up to 4 hex numbers, stored according to the
current value of 'encoding' (e.g., "\u02a4")
\U.... same as \u but allows up to 8 hex numbers.
\b backspace <BS>
\e escape <Esc>
\f formfeed <FF>
\n newline <NL>
\r return <CR>
\t tab <Tab>
\\ backslash
\" double quote
\<xxx> Special key named "xxx". e.g. "\<C-W>" for CTRL-W. This is for use
in mappings, the 0x80 byte is escaped.
To use the double quote character it must be escaped: "<M-\">".
Don't use <Char-xxxx> to get a utf-8 character, use \uxxxx as
mentioned above.
Note that "\xff" is stored as the byte 255, which may be invalid in some
encodings. Use "\u00ff" to store character 255 according to the current value
of 'encoding'.
Note that "\000" and "\x00" force the end of the string.
As you use assigning to register using vim expression language, it's definitely possible in platform independent way. The strings in vim expressions understand the standard escape sequences, therefore it's best to replace ^M with \r and Esc with \x1b:
let #o = ":center\riSomeInsertedString\x1b"
There is no list of of special characters to be translated as far as I know but you can simply take all control characters (ASCII below 32) and translate them to corresponding escape sequence "\xHexValue" where HexValue is the value of the character. Even \r (or ^M) can be translated to \x0d as its ASCII value is 13 (0x0d hex).

Characters to separate value

i need to create a string to store couples of key/value data, for example:
key1::value1||key2::value2||key3::value3
in deserializing it, i may encounter an error if the key or the value happen to contain || or ::
What are common techniques to deal with such situation? thanks
A common way to deal with this is called an escape character or qualifier. Consider this Comma-Separated line:
Name,City,State
John Doe, Jr.,Anytown,CA
Because the name field contains a comma, it of course gets split improperly and so on.
If you enclose each data value by qualifiers, the parser knows when to ignore the delimiter, as in this example:
Name,City,State
"John Doe, Jr.",Anytown,CA
Qualifiers can be optional, used only on data fields that need it. Many implementations will use qualifiers on every field, needed or not.
You may want to implement something similar for your data encoding.
Escape || when serializing, and unescape it when deserializing. A common C-like way to escape is to prepend \. For example:
{ "a:b:c": "foo||bar", "asdf": "\\|||x||||:" }
serialize => "a\:b\:c:foo\|\|bar||asdf:\\\\\|\|\|x\|\|\|\|\:"
Note that \ needs to be escaped (and double escaped due to being placed in a C-style string).
If we assume that you have total control over the input string, then the common way of dealing with this problem is to use an escape character.
Typically, the backslash-\ character is used as an escape to say that "the next character is a special character", so in this case it should not be used as a delimiter. So the parser would see || and :: as delimiters, but would see \|\| as two pipe characters || in either the key or the value.
The next problem is that we have overloaded the backslash. The problem is then, "how do I represent a backslash". This is sovled by saying that the backslash is also escaped, so to represent a \, you would have to say \\. So the parser would see \\ as \.
Note that if you use escape characters, you can use a single character for the delimiters, which might make things simpler.
Alternatively, you may have to restict the input and say that || and :: are just baned and fail/remove when the string is encoded.
A simple solution is to escape a separator (with a backslash, for instance) any time it occurs in data:
Name,City,State
John Doe\, Jr.,Anytown,CA
Of course, the separator will need to be escaped when it occurs in data as well; in this case, a backslash would become \\.
You can use non-ascii character as separator (e.g. vertical tab :-) ).
You can escape separator character in your data during serialization. For example: if you use one character as separator (key1:value1|key2:value2|...) and your data is:
this:is:key1 this|is|data1
this:is:key2 this|is|data2
you double every colon and pipe character in you data when you serialize it. So you will get:
this::is::key1:this||is||data1|this::is::key2:this||is||data2|...
During deserialization whenever you come across two colon or two pipe characters you know that this is not your separator but part of your data and that you have to change it to one character. On the other hand, every single colon or pipe character is you separator.
Use a prefix (say "a") for your special characters (say "b") present in the key and values to store them. This is called escaping.
Then decode the key and values by simply replacing any "ab" sequence with "b". Bear in mind that the prefix is also a special character. An example:
Prefix: \
Special characters: :, |, \
Encoded:
title:Slashdot\: News for Nerds. Stuff that Matters.|shortTitle:\\.
Decoded:
title=Slashdot: News for Nerds. Stuff that Matters.
shortTitle=\.
The common technique is escaping reserved characters, for example:
In urls you escape some characters
using %HEX representation:
http://example.com?aa=a%20b
In programming languages you escape
some characters with a slash prefix:
"\"hello\""

Resources