I see there is a urlencode in terraform but no urldecode. Is there any reason why it is not there? What would be the workaround to achieve that in terraform?
Thanks,
Ram
The functions built in to the Terraform language tend to (with some notable historical exceptions) focus on solving problems that seem to arise very commonly in typical definitions of infrastructure. URL encoding arises in such situations as building URLs for API calls, whereas URL decoding seems to come up less frequently in the typical scope where Terraform is used and so there haven't been sufficient real-world examples of the need to justify making it a built-in.
The Terraform language does include some features that allow for basic manual string tokenization and transforming, but a key missing piece for fully-general URL decoding is that Terraform does not have a function which can take a number and return the corresponding character as defined by a specific character lookup table, such as ASCII or Unicode.
If you know that in practice your inputs will only use URL escaping sequences for specific reserved symbols then you can approximate URL decoding with a lookup table combined with a tokenizing regular expression:
locals {
input = "foo%3Fx%3Dtest"
tokens = regexall("(?:%[0-9a-fA-F]{2}|[^%]+)", local.input)
replacements = tomap({
"%3f" = "?"
"%3d" = "="
"%25" = "%"
})
result = join("", [
for token in local.tokens : (
substr(token, 0, 1) == "%" ?
local.replacements[lower(token)] :
token
)
])
}
The above works for the limited encoding vocabulary of only ?, =, and % but will fail if there are any encoded characters other than those. You can of course expand this vocabulary to include any additional characters you'd like to include, and you could potentially expand that table to include all 128 ASCII characters if you like.
It would not be possible to decode non-ASCII (i.e. Unicode-only) characters with this simplistic strategy because URL encoding of those involves encoding first as UTF-8 and then encoding the individual bytes that result, and the Terraform language does not include any facilities for working with raw bytes: it works only with unicode strings.
Related
I'm trying to store data in Cosmos DB where the IDs use a slash (/). However slash is an illegal character in Cosmos IDs. I initially tried to resolve this by URL encoding slashes (%2F) as that's the form I'd generally receive them in through API requests. However, though percent (%) is not an illegal character for IDs, Cosmos still chokes on them being unable to retrieve many documents with a percent in the ID (it works for some, but it appears if the % is followed by certain characters it fails).
Is there a encoding that is suitable for Cosmos DB IDs that will replace illegal characters in the original ID text without introducing illegal or unhandled characters (like %) in the encoded ID text? I'd prefer to stay away from things like Base64 which makes the IDs hard to decipher for people. And I'd also like to avoid simple character replacement (/ becomes -) in case an ID uses the replacement character.
I ended up doing simple character replacement, swapping out slashes (/) with pipes (|).
The key thing to make this livable is adding a value converter with EntityFramework.
Expression<Func<string?, string>> toDB = v => v!.Replace("/", "|");
Expression<Func<string, string?>> fromDB = v => v!.Replace("|", "/");
builder.Property(p => p.Id).HasConversion(toDB, fromDB);
This allows the character replacement to happen automatically when reading & writing to the database. The only time you need to worry about the difference is if you're accessing the database directly or from other code without the converter. Or possibly doing custom searches. I manually do the translation for a filtering framework we use, and I suspect that other id search solutions would need the same manual translation.
Ultimately I decided this was acceptable as we are unlikely to have other characters that need translation for our case, the translation is easy to do visually, and it's transparent in most cases with ValueConverters. But it isn't a general solution that would work for any possible string id.
Edit:
On second thought, this solution is deficient. Cosmos does actually allow creating documents with illegal characters in the ID, it just doesn't allow accessing or deleting them easily. An ideal solution would prevent all illegal characters across the board, whether expected or not.
I am writing POST request for game I am trying to make scripts for. For this post, I am using the common req = urllib.request.Request(url=url, data=params, headers=headers) First though, I have a dictionary of the data needed for the request, and I must encode it with params = urllib.parse.urlencode(OrderedDict[])
However, this gives me a string, but not the proper one! It will give me:
&x=_1&y_=2&_z_=3
But, the way the game encodes things, it should be:
&x=%5F1&y%5F=2&%5Fz%5F=3
So mine doesn't encode the underscores to be "%5F". How do I fix this? If I can, I have the params that the game uses (in url, pre-encoded for), would I be able to use that in the data field of the request?
Underscores don't need to be encoded, because they are valid characters in URLs.
As per RFC 1738:
Unsafe:
Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters < and > are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (") is used to
delimit URLs in some systems. The character # is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character % is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are {, }, |, \, ^, ~,
[, ], and `.
All unsafe characters must always be encoded within a URL.
So the reason _ is not replaced by %5F is the same reason that a is not replaced by %61: it's just not necessary. Web servers don't (or shouldn't) care either way.
In case the web server you're trying to use does care (but please check first if that's the case), you'll have to do some manual work, as urllibs quoting does not support quoting _:
urllib.parse.quote(string, safe='/', encoding=None, errors=None)
Replace special characters in string using the %xx escape. Letters, digits, and the characters _.- are never quoted.
You can probably wrap quote() with your own function and pass that to urlencode(). Something like this (fully untested):
def extra_quote(*args, **kwargs):
quoted = urllib.pars.quote(*args, **kwargs)
return str.replace(quoted, '_', '%5F')
urllib.parse.urlencode(query, quote_via=extraquote)
ServiceStack's JsonSerializer does not seem to encode control characters correctly.
For example, this C# expression....
JsonSerializer.SerializeToString(new { Text = "\u0010" })
... evaluates to this...
{"Text":"?"}
... where the "?" is the literal control character.
Instead, according to http://www.json.org it should evaluate to this:
{"Text":"\u0010"}
Is this a known bug or am I missing something?
The bad JSON output by my services is causing errors during deserialization by my service consumers.
You need to tell the serializer to escape unicode characters.
JsConfig.EscapeUnicode = true;
JsonSerializer.SerializeToString(new{Text = "\u0010"});
The above evaluates to this:
{"Text":"\u0010"}
Thanks Mike, that works. But I think this approach escapes ALL non-ASCII Unicode characters in addition to control characters.
I'm expecting to have a lot of foreign language characters in my data (Arabic, for example) so this will cause significant size bloat versus just including those unescaped unicode characters in the JSON (which is still standard-compliant).
I imagine the purpose of EscapeUnicode = true is to produce JSON that can be stored or transmitted with simple ASCII encoding, which is certainly useful. And it apparently also encodes ASCII control characters as a side-effect which does solve my problem.
But in my opinion, JsonSerializer should escape control characters regardless of the EscapeUnicode setting since the standard requires it. I consider this a bug.
Since this is primarily a problem for me within my Service Stack services I also found this solution:
SetConfig(new EndpointHostConfig
{
UseBclJsonSerializers = true
});
This tells Service Stack to use .NET's built-in DataContractJsonSerializer instead of Service Stack's JsonSerializer. I have verified that DataContractJsonSerializer does escape control.characters correctly.
So it appears that I need to choose between JsonSerializer with EscapeUnicode = true (faster but with bloated output) and DataContractJsonSerializer (slower but with compact Unicode output).
Are there restricted character patterns within Azure TableStorage RowKeys? I've not been able to find any documented via numerous searches. However, I'm getting behavior that implies such in some performance testing.
I've got some odd behavior with RowKeys consisting on random characters (the test driver does prevent the restricted characters (/ \ # ?) plus blocking single quotes from occurring in the RowKey). The result is I've got a RowKey that will insert fine into the table, but cannot be queried (the result is InvalidInput). For example:
RowKey: 9}5O0J=5Z,4,D,{!IKPE,~M]%54+9G0ZQ&G34!G+
Attempting to query by this RowKwy (equality) will result in an error (both within our app, using Azure Storage Explorer, and Cloud Storage Studio 2). I took a look at the request being sent via Fiddler:
GET /foo()?$filter=RowKey%20eq%20'9%7D5O0J=5Z,4,D,%7B!IKPE,~M%5D%54+9G0ZQ&G34!G+' HTTP/1.1
It appears the %54 in the RowKey is not escaped in the filter. Interestingly, I get similar behavior for batch requests to table storage with URIs in the batch XML that include this RowKey. I've also seen similar behavior for RowKeys with embedded double quotes, though I have not isolated that pattern yet.
Has anyone co me across this behavior? I can easily restrict additional characters from occurring in RowKeys, but would really like to know the 'rules'.
The following characters are not allowed in PartitionKey and RowKey fields:
The forward slash (/) character
The backslash (\) character
The number sign (#) character
The question mark (?) character
Further Reading: Azure Docs > Understanding the Table service data model
public static readonly Regex DisallowedCharsInTableKeys = new Regex(#"[\\\\#%+/?\u0000-\u001F\u007F-\u009F]");
Detection of Invalid Table Partition and Row Keys:
bool invalidKey = DisallowedCharsInTableKeys.IsMatch(tableKey);
Sanitizing the Invalid Partition or Row Key:
string sanitizedKey = DisallowedCharsInTableKeys.Replace(tableKey, disallowedCharReplacement);
At this stage you may also want to prefix the sanitized key (Partition Key or Row Key) with the hash of the original key to avoid false collisions of different invalid keys having the same sanitized value.
Do not use the string.GetHashCode() though since it may produce different hash code for the same string and shall not be used to identify uniqueness and shall not be persisted.
I use SHA256: https://msdn.microsoft.com/en-us/library/s02tk69a(v=vs.110).aspx
to create the byte array hash of the invalid key, convert the byte array to hex string and prefix the sanitized table key with that.
Also see related MSDN Documentation:
https://msdn.microsoft.com/en-us/library/azure/dd179338.aspx
Related Section from the link:
Characters Disallowed in Key Fields
The following characters are not allowed in values for the PartitionKey and RowKey properties:
The forward slash (/) character
The backslash (\) character
The number sign (#) character
The question mark (?) character
Control characters from U+0000 to U+001F, including:
The horizontal tab (\t) character
The linefeed (\n) character
The carriage return (\r) character
Control characters from U+007F to U+009F
Note that in addition to the mentioned chars in the MSDN article, I also added the % char to the pattern since I saw in a few places where people mention it being problematic. I guess some of this also depends on the language and the tech you are using to access the table storage.
If you detect additional problematic chars in your case, then you can add those to the regex pattern, nothing else needs to change.
I just found out (the hard way) that the '+' sign is allowed, but not possible to query in PartitionKey.
I found that in addition to the characters listed in Igorek's answer, these also can cause problems (e.g. inserts will fail):
|
[]
{}
<>
$^&
Tested with the Azure Node.js SDK.
I transform the key using this function:
private static string EncodeKey(string key)
{
return HttpUtility.UrlEncode(key);
}
This needs to be done for the insert and for the retrieve of course.
I need to accept a list of file names in a query string. ie:
http://someSite/someApp/myUtil.ashx?files=file1.txt|file2.bmp|file3.doc
Do you have any recommendations on what delimiter to use?
Having query parameters multiple times is legal, and the only way to guarantee no parsing problems in all cases:
http://someSite/someApp/myUtil.ashx?file=file1.txt&file=file2.bmp&file=file3.doc
The semicolon ; must be URI encoded if part of a filename (turned to %3B), yet not if it is separating query parameters which is its reserved use.
See section 2.2 of this rfc:
2.2. Reserved Characters
URIs include components and subcomponents that are delimited by
characters in the "reserved" set. These characters are called
"reserved" because they may (or may not) be defined as delimiters by
the generic syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's dereferencing algorithm.
If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "#"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
If they're filenames, a good choice would be a character which is disallowed in filenames. Suggestions so far included , | & which are generally allowed in filenames and therefore might lead to ambiguities. / on the other hand is generally not allowed, not even on Windows. It is allowed in URIs, and it has no special meaning in query strings.
Example:
http://someSite/someApp/myUtil.ashx?files=file1.txt|file2.bmp|file3.doc is bad because it may refer to the valid file file1.txt|file2.bmp.
http://someSite/someApp/myUtil.ashx?files=file1.txt/file2.bmp/file3.doc unambiguously refers to 3 files.
I would recommend making each file its own query parameter, i.e.
myUtil.ashx?file1=file1.txt&file2=file2.bmp&file3=file3.doc
This way you can just use standard query parsing and loop
Do you need to list the filenames as a string?
Most languages accepts arrays in the querystring so you could write it like
http://someSite/someApp/myUtil.ashx?files[]=file1.txt&files[]=file2.bmp&files[]=file3.doc
If it doesn't, or you can't use for some other reason, you should stick to a delimiter that is either not allowed or unusual in a filename. Pipe (|) is a good one, otherwise you could urlencode an invisible character since they are quite easy to use in coding, but harder to actually include in a filename.
I usually use arrays when possible and pipe otherwise.
I've always used double pipes "||". I don't have any good evidence to back up why this is a good choice other than 10 years of web programming and it's never been an issue.
This is one common problem. How i handled it was: I created a method which accepted a list of strings, then found a character that was not in any of the strings. (I did this by a simple concatenation of the strings, then testing for various characters.) Once a character was found, concatenated all the strings together but also prepended the string with the separation character. So in the given question, one example wud be:
http://someSite/someApp/myUtil.ashx?files=|file1.txt|file2.bmp|file3.doc
and another wud be:
http://someSite/someApp/myUtil.ashx?files=,file1.txt,file2.bmp,file3.doc
But since i actually use a method that guarantees my separator character is not in the rest of the strings, it is safe. It was a bit of work to create the first time, but i've used it MANY times in various applications.
I think I would consider using commas or semicolons.
I would build on MSalters answer by saying, to generalize, the best delimiter is one that is invalid to the items in the list. For example, if your list is prices, a comma is a bad delimiter because it can be confused with the values. For that reason, as most these answers suggest, I think a good general purpose delimiter is probably "|" as it is rarely a valid value. "/" is maybe not the best delimiter generally as it is valid for paths sometimes.