unescape alternative in node.js - node.js

I'm using the deprecated unescape function in one of my program.
The protocol I'm working with sends escaped binary strings via the query string. So on their side they are doing something along the lines of (0-9, a-z, A-Z, '.', '-', '_' and '~' are encoded using the "%nn" format):
var source = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a";
var encoded = escape(source);
// escaped is now "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
So I'm receiving this string on my end and I have to decode it. decodeURIComponent is not working in this case so I rely on unescape:
var received = "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A";
var binaryString = unescape(received);
Since unescape is deprecated, any pointers on how should I decode these binary strings?
Note: querystring.unescape doesn't work either...

Having a similar issue here, after doing some research I found this which can still be of use to someone.
https://www.npmjs.com/package/unescape

The unescape() function was deprecated in JavaScript version 1.5. Use decodeURI() or decodeURIComponent() instead.
The unescape() function decodes an encoded string.

Related

Python - Convert unicode entity into unicode symbol

My request.json(), When I loop through the dict it returns from an API, returns "v\u00F6lk" (without the quotes)
But I want "völk" (without the quotes), which is how it is raw in the API.
How do I convert?
request = requests.post(get_sites_url, headers=api_header, params=search_sites_params, timeout=http_timeout_seconds)
return_search_results = request.json()
for site_object in return_search_results['data']:
site_name = str(site_object['name'])
site_name_fixed=str(site_name.encode("utf-8").decode())
print("fixed site_name: " + site_name_fixed)
My Guess, the API is actually returning the literal version, so he is really getting:
"v\\u00F6lk"
Printing that gives what we think we are getting from the api:
print("v\\u00F6lk")
v\u00F6lk
I am not sure if there is a better way to do this, but encoding it with "utf-8", then using "unicode_escape" to decode seemed to work:
>>> print(bytes("v\\u00F6lk", "utf-8").decode("unicode_escape"))
völk
>>> print("v\\u00F6lk".encode("utf-8").decode("unicode_escape"))
völk

Parsing a non-Unicode string with Flask-RESTful

I have a webhook developed with Flask-RESTful which gets several parameters with POST.
One of the parameters is a non-Unicode string, encoded in cp1251.
Can't find a way to correctly parse this argument using reqparse.
Here is the fragment of my code:
parser = reqparse.RequestParser()
parser.add_argument('text')
msg = parser.parse_args()
Then, I write msg to a text file, and it looks like this:
{"text": "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd !\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\n\n-- \n\ufffd \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd."}
As you can see, Flask somehow replaces all Cyrillic characters with \ufffd. At the same time, non-Cyrillic characters, like ! or \n are processed correctly.
Anything I can do to advise RequestParser with the string encoding?
Here is my code for writing the text to disk:
f = open('log_msg.txt', 'w+')
f.write(json.dumps(msg))
f.close()
I tried f = open('log_msg.txt', 'w+', encoding='cp1251') with the same result.
Then, I tried
f = open('log_msg_ascii.txt', 'w+')
f.write(ascii(json.dumps(msg)))
Also, no difference.
So, I'm pretty sure it's RequestParser() tries to be too smart and can't understand the non-Unicode input.
Thanks!
Okay, I finally found a workaround. Thanks to #lenz for helping me with this issue. It seems that reqparse wrongly assumes that every string parameter comes as UTF-8. So when it sees a non-Unicode input field (among other Unicode fields!), it tries to load it as Unicode and fails. As a result, all characters are U+FFFD (replacement character).
So, to access that non-Unicode field, I did the following trick.
First, I load raw data using get_data(), decode it using cp1251 and parse with a simple regexp.
raw_data = request.get_data()
contents = raw_data.decode('windows-1251')
match = re.search(r'(?P<delim>--\w+\r?\n)Content-Disposition: form-data; name=\"text\"\r?\n(.*?)(?P=delim)', contents, re.MULTILINE | re.DOTALL)
text = match.group(2)
Not the most beautiful solution, but it works.

python3 uuid to base64.urlsafe encode and decode mismatch

I'm having a problem getting a base64-encoded uuid to match the original uuid.
Here is the code:
import base64, uuid
def uuid2slug(uuidstring):
return base64.urlsafe_b64encode(uuid.uuid1().bytes).decode("utf-8").rstrip('=\n').replace('/', '_')
def slug2uuid(slug):
return uuid.UUID(bytes=base64.urlsafe_b64decode((slug + '==').replace('_', '/')))
uid = uuid.uuid1()
urlslug = uuid2slug(uid)
urluid = slug2uuid(urlslug)
print(uid)
print(urlslug)
print(urluid)
This returns a mismatch in the uuid's first column:
cfe71fa2-7d39-11e7-9264-000c29023711
z-cg7H05EeeSZAAMKQI3EQ
cfe720ec-7d39-11e7-9264-000c29023711
Any thoughts?
This is using Python 3.5.3
As mentioned in the comments, the problem in your code was that you were not using the argument you passed to the function, uuidstring.
Also note that you are using the urlsafe encoding and decoding libraries, so you don't need to replace the slashes yourself.
For reference, a Base64 value can be defined with the following regex, ^[A-Za-z0-9+/]+={0,2}$, where + and - are the only non-alphanumeric symbols, and = is only used for padding. The URL encoding is explained in the Base64 (Wikipedia) article,
the '+' and '/' characters of standard Base64 are respectively replaced by '-' and '_', so that using URL encoders/decoders is no longer necessary
Long story short, the correct version of your functions, without the redundant calls to replace are:
def uuid2slug(uuidstring):
return base64.urlsafe_b64encode(uuidstring.bytes).decode("utf-8").strip('=')
def slug2uuid(slug):
return uuid.UUID(bytes=base64.urlsafe_b64decode(slug+'=='))
If you run your code a couple of times, you will find hyphens and underscores, and no slashes.
E.g.
471f8fc4-5ec5-11ed-9645-06ca5f5b4308
Rx-PxF7FEe2WRQbKX1tDCA
471f8fc4-5ec5-11ed-9645-06ca5f5b4308
ac74e9fe-5ec6-11ed-b5e7-06ca5f5b4308
rHTp_l7GEe215wbKX1tDCA
ac74e9fe-5ec6-11ed-b5e7-06ca5f5b4308

Encode to Unicode UTF-8 not working

My Code -
var utf8 = require('utf8');
var y = utf8.encode('एस एम एस गपशप');
console.log(y);
Input -
एस एम एस गपशप
Expecting Output - \xE0\xA4\x8F\xE0\xA4\xB8\x20\xE0\xA4\x8F\xE0\xA4\xAE\x20\xE0\xA4\x8F\xE0\xA4\xB8\x20\xE0\xA4\x97\xE0\xA4\xAA\xE0\xA4\xB6\xE0\xA4\xAA
Example Encoding using utf8.js
Output -
à¤à¤¸ à¤à¤® à¤à¤¸ à¤à¤ªà¤¶à¤ª
What am I doing wrong? Please help!
That code appears to be working. That output looks like UTF-8 bytes interpreted as some 8-bit character set, most likely ISO-8859-1, which is easily recognisable by the repeating patterns.
That example output is just how you would represent that string in source code.
Try this:
var utf8 = require('utf8');
var y = utf8.encode('एस');
console.log(y);
console.log('\xE0\xA4\x8F\xE0\xA4\xB8');
You will probably see the same output twice.
You can easily write some code to get that hexadecimal forms back using a lookup table and the charCodeAt function, but it is a rather unusual way to represent a string in JavaScript. JSON for example either just uses the literal characters, or '\uXXXX' escapes.

Comparing Strings in MATLAB Problem

I've been fiddling around with my program and I've been using a modified version of urlread that allows for BASIC authentication. The problem is that I have to include the following line of code to the base urlread function:
urlConnection.setRequestProperty('Authorization', 'Basic passphrase');
...where passphrase is the a base64 encoded string of 'user:pass'. If I place the passphrase directly into the string on that line the program will work just fine, the trouble starts when I try to concatenate to get that resulting 'Basic passphrase' string. Initially I just had:
['Basic', ' ', passphrase]
After that did not work I did some exploring and experimenting around in the command window.:
passphrase = 'somerandompassphrase';
teststr1 = ['Basic', ' ', passphrase];
teststr2 = ['Basic', ' ', 'somerandompassphrase'];
teststr3 = 'Basic somerandompassphrase';
strcmp(teststr1, teststr2)
strcmp(teststr1, teststr3)
strcmp(teststr2, teststr3)
The output is 1, or true for each one (as expected). However if I take the base64encode of 'somerandompassphrase' (which is 'c29tZXJhbmRvbXBhc3NwaHJhc2U='):
encoded = base64encode(passphrase);
teststr1 = ['Basic', ' ', encoded];
teststr2 = ['Basic', ' ', 'c29tZXJhbmRvbXBhc3NwaHJhc2U='];
strcmp(teststr1, teststr2)
The output is 0, or false. Shouldn't it be true though? The base64encode function can be found here.
Even from a quick test of:
strcmp(encoded, 'c29tZXJhbmRvbXBhc3NwaHJhc2U=')
The output is still 0.
Please help, I have no idea what's going on.
As shown here, you can also use the base64 encoder from the the Apache Commons Codec Java library which comes bundled with MATLAB and is available on the classpath:
encoder = org.apache.commons.codec.binary.Base64();
b64str = char( encoder.encode(passphrase-0) )';
I actually figured this out right before I posted the question, but I figured I'd go ahead and leave it up in case people run into the same problem as I did.
The problem is from the base64encode function. It automatically adds a newline character to the end of the string, causing the strcmp function to return false. To fix this you can include a parameter for the optional parameter to the base64encode function, if you put in a blank string it won't add a newline character to the end of it causing it to work.
encoded = base64encode(passphrase, '');

Resources