Comparing Strings in MATLAB Problem - string

I've been fiddling around with my program and I've been using a modified version of urlread that allows for BASIC authentication. The problem is that I have to include the following line of code to the base urlread function:
urlConnection.setRequestProperty('Authorization', 'Basic passphrase');
...where passphrase is the a base64 encoded string of 'user:pass'. If I place the passphrase directly into the string on that line the program will work just fine, the trouble starts when I try to concatenate to get that resulting 'Basic passphrase' string. Initially I just had:
['Basic', ' ', passphrase]
After that did not work I did some exploring and experimenting around in the command window.:
passphrase = 'somerandompassphrase';
teststr1 = ['Basic', ' ', passphrase];
teststr2 = ['Basic', ' ', 'somerandompassphrase'];
teststr3 = 'Basic somerandompassphrase';
strcmp(teststr1, teststr2)
strcmp(teststr1, teststr3)
strcmp(teststr2, teststr3)
The output is 1, or true for each one (as expected). However if I take the base64encode of 'somerandompassphrase' (which is 'c29tZXJhbmRvbXBhc3NwaHJhc2U='):
encoded = base64encode(passphrase);
teststr1 = ['Basic', ' ', encoded];
teststr2 = ['Basic', ' ', 'c29tZXJhbmRvbXBhc3NwaHJhc2U='];
strcmp(teststr1, teststr2)
The output is 0, or false. Shouldn't it be true though? The base64encode function can be found here.
Even from a quick test of:
strcmp(encoded, 'c29tZXJhbmRvbXBhc3NwaHJhc2U=')
The output is still 0.
Please help, I have no idea what's going on.

As shown here, you can also use the base64 encoder from the the Apache Commons Codec Java library which comes bundled with MATLAB and is available on the classpath:
encoder = org.apache.commons.codec.binary.Base64();
b64str = char( encoder.encode(passphrase-0) )';

I actually figured this out right before I posted the question, but I figured I'd go ahead and leave it up in case people run into the same problem as I did.
The problem is from the base64encode function. It automatically adds a newline character to the end of the string, causing the strcmp function to return false. To fix this you can include a parameter for the optional parameter to the base64encode function, if you put in a blank string it won't add a newline character to the end of it causing it to work.
encoded = base64encode(passphrase, '');

Related

Python - Convert unicode entity into unicode symbol

My request.json(), When I loop through the dict it returns from an API, returns "v\u00F6lk" (without the quotes)
But I want "völk" (without the quotes), which is how it is raw in the API.
How do I convert?
request = requests.post(get_sites_url, headers=api_header, params=search_sites_params, timeout=http_timeout_seconds)
return_search_results = request.json()
for site_object in return_search_results['data']:
site_name = str(site_object['name'])
site_name_fixed=str(site_name.encode("utf-8").decode())
print("fixed site_name: " + site_name_fixed)
My Guess, the API is actually returning the literal version, so he is really getting:
"v\\u00F6lk"
Printing that gives what we think we are getting from the api:
print("v\\u00F6lk")
v\u00F6lk
I am not sure if there is a better way to do this, but encoding it with "utf-8", then using "unicode_escape" to decode seemed to work:
>>> print(bytes("v\\u00F6lk", "utf-8").decode("unicode_escape"))
völk
>>> print("v\\u00F6lk".encode("utf-8").decode("unicode_escape"))
völk

Python 3 String Formatting Issues

I have run into an issue where i can't format a string to be printed.
The function is suppossed to convert Binary into Text which is does brilliantly but the printed out result is formatted all the way the right and not the left.
I have tried resolving this by looking up how to format the string but im getting no luck. Im hoping someone can resolve this issue for me.
Heres the code:
elif Converter_Choice2 == str(3):
def Bin_to_Txt():
print("\nYour Message in Binary:")
bin_input = input("")
binary_int = int(bin_input, 2)
byte_number = binary_int.bit_length() + 7 // 8
binary_array = binary_int.to_bytes(byte_number, "big")
ascii_text = binary_array.decode()
clear()
print("\nYour Message in Text:")
print(ascii_text)
Bin_to_Txt()
I tried different ways to format it but im still new to Python 3. I tried putting "ascii_text" into another string to format it, so i could print that string but it didn't work.
ascii_text_formatted = ("{:<15}".format(ascii_text))
print(ascii_text_formatted)
Some advice for this would be great.
Heres a quick Binary code that can be used: 0100100001100101011011000110110001101111
The decoded version should say "Hello".
I managed to find the answer. If anyone else has this issue or something similar try this:
The issue was the variable "binary_array" was printing out invisible numbers before the printed answer in this case "Hello". Due to this it would print "Hello" all the way to the right as the invisible numbers where in front of it.
To fix this issue i added [34:] at the end of the "binary_array" string to remove those invisible numbers from the print. By adding [34:] it means the first 34 characters/numbers wont be printed even if they are invisible. So this can be any number that you need it to be. For example if i changed 34 to 35 it would remove the "H" from "Hello" and print "ello".
Heres some screenshots of the function block and printed responces from before and after adding [34:].
https://imgur.com/a/W25G1FZ

Parsing a non-Unicode string with Flask-RESTful

I have a webhook developed with Flask-RESTful which gets several parameters with POST.
One of the parameters is a non-Unicode string, encoded in cp1251.
Can't find a way to correctly parse this argument using reqparse.
Here is the fragment of my code:
parser = reqparse.RequestParser()
parser.add_argument('text')
msg = parser.parse_args()
Then, I write msg to a text file, and it looks like this:
{"text": "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd !\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\n\n-- \n\ufffd \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd."}
As you can see, Flask somehow replaces all Cyrillic characters with \ufffd. At the same time, non-Cyrillic characters, like ! or \n are processed correctly.
Anything I can do to advise RequestParser with the string encoding?
Here is my code for writing the text to disk:
f = open('log_msg.txt', 'w+')
f.write(json.dumps(msg))
f.close()
I tried f = open('log_msg.txt', 'w+', encoding='cp1251') with the same result.
Then, I tried
f = open('log_msg_ascii.txt', 'w+')
f.write(ascii(json.dumps(msg)))
Also, no difference.
So, I'm pretty sure it's RequestParser() tries to be too smart and can't understand the non-Unicode input.
Thanks!
Okay, I finally found a workaround. Thanks to #lenz for helping me with this issue. It seems that reqparse wrongly assumes that every string parameter comes as UTF-8. So when it sees a non-Unicode input field (among other Unicode fields!), it tries to load it as Unicode and fails. As a result, all characters are U+FFFD (replacement character).
So, to access that non-Unicode field, I did the following trick.
First, I load raw data using get_data(), decode it using cp1251 and parse with a simple regexp.
raw_data = request.get_data()
contents = raw_data.decode('windows-1251')
match = re.search(r'(?P<delim>--\w+\r?\n)Content-Disposition: form-data; name=\"text\"\r?\n(.*?)(?P=delim)', contents, re.MULTILINE | re.DOTALL)
text = match.group(2)
Not the most beautiful solution, but it works.

python3 uuid to base64.urlsafe encode and decode mismatch

I'm having a problem getting a base64-encoded uuid to match the original uuid.
Here is the code:
import base64, uuid
def uuid2slug(uuidstring):
return base64.urlsafe_b64encode(uuid.uuid1().bytes).decode("utf-8").rstrip('=\n').replace('/', '_')
def slug2uuid(slug):
return uuid.UUID(bytes=base64.urlsafe_b64decode((slug + '==').replace('_', '/')))
uid = uuid.uuid1()
urlslug = uuid2slug(uid)
urluid = slug2uuid(urlslug)
print(uid)
print(urlslug)
print(urluid)
This returns a mismatch in the uuid's first column:
cfe71fa2-7d39-11e7-9264-000c29023711
z-cg7H05EeeSZAAMKQI3EQ
cfe720ec-7d39-11e7-9264-000c29023711
Any thoughts?
This is using Python 3.5.3
As mentioned in the comments, the problem in your code was that you were not using the argument you passed to the function, uuidstring.
Also note that you are using the urlsafe encoding and decoding libraries, so you don't need to replace the slashes yourself.
For reference, a Base64 value can be defined with the following regex, ^[A-Za-z0-9+/]+={0,2}$, where + and - are the only non-alphanumeric symbols, and = is only used for padding. The URL encoding is explained in the Base64 (Wikipedia) article,
the '+' and '/' characters of standard Base64 are respectively replaced by '-' and '_', so that using URL encoders/decoders is no longer necessary
Long story short, the correct version of your functions, without the redundant calls to replace are:
def uuid2slug(uuidstring):
return base64.urlsafe_b64encode(uuidstring.bytes).decode("utf-8").strip('=')
def slug2uuid(slug):
return uuid.UUID(bytes=base64.urlsafe_b64decode(slug+'=='))
If you run your code a couple of times, you will find hyphens and underscores, and no slashes.
E.g.
471f8fc4-5ec5-11ed-9645-06ca5f5b4308
Rx-PxF7FEe2WRQbKX1tDCA
471f8fc4-5ec5-11ed-9645-06ca5f5b4308
ac74e9fe-5ec6-11ed-b5e7-06ca5f5b4308
rHTp_l7GEe215wbKX1tDCA
ac74e9fe-5ec6-11ed-b5e7-06ca5f5b4308

Unable to remove string from text I am extracting from html

I am trying to extract the main article from a web page. I can accomplish the main text extraction using Python's readability module. However the text I get back often contains several &#13 strings (there is a ; at the end of this string but this editor won't allow the full string to be entered (strange!)). I have tried using the python replace function, I have also tried using regular expression's replace function, I have also tried using the unicode encode and decode functions. None of these approaches have worked. For the replace and Regular Expression approaches I just get back my original text with the &#13 strings still present and with the unicode encode decode approach I get back the error message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 2099: ordinal not in range(128)
Here is the code I am using that takes the initial URL and using readability extracts the main article. I have left in all my commented out code that corresponds to the different approaches I have tried to remove the 
 string. It appears as though &#13 is interpreted to be u'\xa9'.
from readability.readability import Document
def find_main_article_text_2():
#url = 'http://finance.yahoo.com/news/questcor-pharmaceuticals-closes-transaction-acquire-130000695.html'
url = "http://us.rd.yahoo.com/finance/industry/news/latestnews/*http://us.rd.yahoo.com/finance/external/cbsm/SIG=11iiumket/*http://www.marketwatch.com/News/Story/Story.aspx?guid=4D9D3170-CE63-4570-B95B-9B16ABD0391C&siteid=yhoof2"
html = urllib.urlopen(url).read()
readable_article = Document(html).summary()
readable_title = Document(html).short_title()
#readable_article.replace("u'\xa9'"," ")
#print re.sub("
",'',readable_article)
#unicodedata.normalize('NFKD', readable_article).encode('ascii','ignore')
print readable_article
#print readable_article.decode('latin9').encode('utf8'),
print "There are " ,readable_article.count("
"),"
's"
#print readable_article.encode( sys.stdout.encoding , '' )
#sent_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
#sents = sent_tokenizer.tokenize(readable_article)
#new_sents = []
#for sent in sents:
# unicode_sent = sent.decode('utf-8')
# s1 = unicode_sent.encode('ascii', 'ignore')
#s2 = s1.replace("\n","")
# new_sents.append(s1)
#print new_sents
# u'\xa9'
I have a URL that I have been testing the code with included inside the def. If anybody has any ideas on how to remove this &#13 I would appreciate the help. Thanks, George

Resources