Azure CosmosDB: illegal characters in Document Id

Azure CosmosDB: illegal characters in Document Id - azure

I have the issue that the Ids that are being generated based on certain input contain the character "/". This leads to an error during the Upsert operation as "/" is not allowed in the Document id.
Which characters are not allowed beside that?
What are ways to handle such a situation?

The illegal characters are /, \\, ?, # (see https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.documents.resource.id?view=azure-dotnet)
Ways to deal with such a situation:
Remove the character already in the input used for generating the id
Replace the character in the id with another character / set of characters
Use Hashing / Encoding (e.g. Base64)
If you know a better way please share. Thanks

I'm base64 encoding the plaintext. Then replacing the '/' and '=' that can still be there from base64 with '-' and '_' respectively. This seems to be working well. I ran into other issues when I just tried UrlEncode the value.
Psuedo:
var encoded = String.ConvertToBase64(plainTextId);
var preppedId = encoded.Replace('/','-').Replace('=','_');

Related

Acceptable encoding for Cosmos DB IDs to replace illegal characters?

I'm trying to store data in Cosmos DB where the IDs use a slash (/). However slash is an illegal character in Cosmos IDs. I initially tried to resolve this by URL encoding slashes (%2F) as that's the form I'd generally receive them in through API requests. However, though percent (%) is not an illegal character for IDs, Cosmos still chokes on them being unable to retrieve many documents with a percent in the ID (it works for some, but it appears if the % is followed by certain characters it fails).
Is there a encoding that is suitable for Cosmos DB IDs that will replace illegal characters in the original ID text without introducing illegal or unhandled characters (like %) in the encoded ID text? I'd prefer to stay away from things like Base64 which makes the IDs hard to decipher for people. And I'd also like to avoid simple character replacement (/ becomes -) in case an ID uses the replacement character.

I ended up doing simple character replacement, swapping out slashes (/) with pipes (|).
The key thing to make this livable is adding a value converter with EntityFramework.
Expression<Func<string?, string>> toDB = v => v!.Replace("/", "|");
Expression<Func<string, string?>> fromDB = v => v!.Replace("|", "/");
builder.Property(p => p.Id).HasConversion(toDB, fromDB);
This allows the character replacement to happen automatically when reading & writing to the database. The only time you need to worry about the difference is if you're accessing the database directly or from other code without the converter. Or possibly doing custom searches. I manually do the translation for a filtering framework we use, and I suspect that other id search solutions would need the same manual translation.
Ultimately I decided this was acceptable as we are unlikely to have other characters that need translation for our case, the translation is easy to do visually, and it's transparent in most cases with ValueConverters. But it isn't a general solution that would work for any possible string id.
Edit:
On second thought, this solution is deficient. Cosmos does actually allow creating documents with illegal characters in the ID, it just doesn't allow accessing or deleting them easily. An ideal solution would prevent all illegal characters across the board, whether expected or not.

encode variables with special characters for azure

Very Similar problem to AADSTS50012: Invalid client secret is provided when moving from a Test App to Production
The top answer says to Encode your secret e.g. replace + by %2B and = by %3D, etc how would I replace the special character Tilde ~

As Suggested by juunas, and as per the document yes, you can replace the special character.
URL encoding converts characters into a format that can be transmitted over the Internet.
Here is the link for complete information regarding Encoding Techniques.

is there special characters or strings cannot be used as key in rapidjson?

can special characters be used as a part of key?
For example :
{
"+new":"addnew.png"
"":"empty.png"
}
Is this format of rapidjson valid?
Also, is there any special strings that is not valid to use as key?
(I think the earlier question cannot fully answer my question because it does not cover the case of empty string, e.g.:"":"empty.png")

Yes. Any valid string can be used as key.
But the shown JSON misses a comma between two members.

By http://www.json.org/:
A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.
You can use JSONLint to validate the json string.

Azure Table Storage RowKey restricted Character Patterns?

Are there restricted character patterns within Azure TableStorage RowKeys? I've not been able to find any documented via numerous searches. However, I'm getting behavior that implies such in some performance testing.
I've got some odd behavior with RowKeys consisting on random characters (the test driver does prevent the restricted characters (/ \ # ?) plus blocking single quotes from occurring in the RowKey). The result is I've got a RowKey that will insert fine into the table, but cannot be queried (the result is InvalidInput). For example:
RowKey: 9}5O0J=5Z,4,D,{!IKPE,~M]%54+9G0ZQ&G34!G+
Attempting to query by this RowKwy (equality) will result in an error (both within our app, using Azure Storage Explorer, and Cloud Storage Studio 2). I took a look at the request being sent via Fiddler:
GET /foo()?$filter=RowKey%20eq%20'9%7D5O0J=5Z,4,D,%7B!IKPE,~M%5D%54+9G0ZQ&G34!G+' HTTP/1.1
It appears the %54 in the RowKey is not escaped in the filter. Interestingly, I get similar behavior for batch requests to table storage with URIs in the batch XML that include this RowKey. I've also seen similar behavior for RowKeys with embedded double quotes, though I have not isolated that pattern yet.
Has anyone co me across this behavior? I can easily restrict additional characters from occurring in RowKeys, but would really like to know the 'rules'.

The following characters are not allowed in PartitionKey and RowKey fields:
The forward slash (/) character
The backslash (\) character
The number sign (#) character
The question mark (?) character
Further Reading: Azure Docs > Understanding the Table service data model

public static readonly Regex DisallowedCharsInTableKeys = new Regex(#"[\\\\#%+/?\u0000-\u001F\u007F-\u009F]");
Detection of Invalid Table Partition and Row Keys:
bool invalidKey = DisallowedCharsInTableKeys.IsMatch(tableKey);
Sanitizing the Invalid Partition or Row Key:
string sanitizedKey = DisallowedCharsInTableKeys.Replace(tableKey, disallowedCharReplacement);
At this stage you may also want to prefix the sanitized key (Partition Key or Row Key) with the hash of the original key to avoid false collisions of different invalid keys having the same sanitized value.
Do not use the string.GetHashCode() though since it may produce different hash code for the same string and shall not be used to identify uniqueness and shall not be persisted.
I use SHA256: https://msdn.microsoft.com/en-us/library/s02tk69a(v=vs.110).aspx
to create the byte array hash of the invalid key, convert the byte array to hex string and prefix the sanitized table key with that.
Also see related MSDN Documentation:
https://msdn.microsoft.com/en-us/library/azure/dd179338.aspx
Related Section from the link:
Characters Disallowed in Key Fields
The following characters are not allowed in values for the PartitionKey and RowKey properties:
The forward slash (/) character
The backslash (\) character
The number sign (#) character
The question mark (?) character
Control characters from U+0000 to U+001F, including:
The horizontal tab (\t) character
The linefeed (\n) character
The carriage return (\r) character
Control characters from U+007F to U+009F
Note that in addition to the mentioned chars in the MSDN article, I also added the % char to the pattern since I saw in a few places where people mention it being problematic. I guess some of this also depends on the language and the tech you are using to access the table storage.
If you detect additional problematic chars in your case, then you can add those to the regex pattern, nothing else needs to change.

I just found out (the hard way) that the '+' sign is allowed, but not possible to query in PartitionKey.

I found that in addition to the characters listed in Igorek's answer, these also can cause problems (e.g. inserts will fail):
|
[]
{}
<>
$^&
Tested with the Azure Node.js SDK.

I transform the key using this function:
private static string EncodeKey(string key)
{
return HttpUtility.UrlEncode(key);
}
This needs to be done for the insert and for the retrieve of course.

Delimiter to use within a query string value

I need to accept a list of file names in a query string. ie:
http://someSite/someApp/myUtil.ashx?files=file1.txt|file2.bmp|file3.doc
Do you have any recommendations on what delimiter to use?

Having query parameters multiple times is legal, and the only way to guarantee no parsing problems in all cases:
http://someSite/someApp/myUtil.ashx?file=file1.txt&file=file2.bmp&file=file3.doc
The semicolon ; must be URI encoded if part of a filename (turned to %3B), yet not if it is separating query parameters which is its reserved use.
See section 2.2 of this rfc:
2.2. Reserved Characters
URIs include components and subcomponents that are delimited by
characters in the "reserved" set. These characters are called
"reserved" because they may (or may not) be defined as delimiters by
the generic syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's dereferencing algorithm.
If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "#"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="

If they're filenames, a good choice would be a character which is disallowed in filenames. Suggestions so far included , | & which are generally allowed in filenames and therefore might lead to ambiguities. / on the other hand is generally not allowed, not even on Windows. It is allowed in URIs, and it has no special meaning in query strings.
Example:
http://someSite/someApp/myUtil.ashx?files=file1.txt|file2.bmp|file3.doc is bad because it may refer to the valid file file1.txt|file2.bmp.
http://someSite/someApp/myUtil.ashx?files=file1.txt/file2.bmp/file3.doc unambiguously refers to 3 files.

I would recommend making each file its own query parameter, i.e.
myUtil.ashx?file1=file1.txt&file2=file2.bmp&file3=file3.doc
This way you can just use standard query parsing and loop

Do you need to list the filenames as a string?
Most languages accepts arrays in the querystring so you could write it like
http://someSite/someApp/myUtil.ashx?files[]=file1.txt&files[]=file2.bmp&files[]=file3.doc
If it doesn't, or you can't use for some other reason, you should stick to a delimiter that is either not allowed or unusual in a filename. Pipe (|) is a good one, otherwise you could urlencode an invisible character since they are quite easy to use in coding, but harder to actually include in a filename.
I usually use arrays when possible and pipe otherwise.

I've always used double pipes "||". I don't have any good evidence to back up why this is a good choice other than 10 years of web programming and it's never been an issue.

This is one common problem. How i handled it was: I created a method which accepted a list of strings, then found a character that was not in any of the strings. (I did this by a simple concatenation of the strings, then testing for various characters.) Once a character was found, concatenated all the strings together but also prepended the string with the separation character. So in the given question, one example wud be:
http://someSite/someApp/myUtil.ashx?files=|file1.txt|file2.bmp|file3.doc
and another wud be:
http://someSite/someApp/myUtil.ashx?files=,file1.txt,file2.bmp,file3.doc
But since i actually use a method that guarantees my separator character is not in the rest of the strings, it is safe. It was a bit of work to create the first time, but i've used it MANY times in various applications.

I think I would consider using commas or semicolons.

I would build on MSalters answer by saying, to generalize, the best delimiter is one that is invalid to the items in the list. For example, if your list is prices, a comma is a bad delimiter because it can be confused with the values. For that reason, as most these answers suggest, I think a good general purpose delimiter is probably "|" as it is rarely a valid value. "/" is maybe not the best delimiter generally as it is valid for paths sometimes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Azure CosmosDB: illegal characters in Document Id - azure

I have the issue that the Ids that are being generated based on certain input contain the character "/". This leads to an error during the Upsert operation as "/" is not allowed in the Document id. Which characters are not allowed beside that? What are ways to handle such a situation?

Related

Acceptable encoding for Cosmos DB IDs to replace illegal characters?

encode variables with special characters for azure

is there special characters or strings cannot be used as key in rapidjson?

Azure Table Storage RowKey restricted Character Patterns?

Delimiter to use within a query string value

Categories

Resources