Perl CGI: get request not working unless URL is hard-coded - string

I'm trying to identify the differences between the two strings during a URL get request (using LWP::Simple).
I have a URL, say http://www.example.com?param1=x&param2=y&param3=z
I make sure any blank inputs are also taken care of, but that is irrelevant at this point, because I am making sure all parameters are exactly the same.
Also, the hard-coded URL is copied and pasted from the generated URL.
This URL works when I do the following:
my $url = "http://www.example.com?param1=x&param2=y&param3=z";
my $content = get($url);
Yet, when I build the URL from parameters provided by a user, the get request does not work (Error: 500 from the site).
I have compared the two URLs by printing them out, and see zero differences. I've tried removing all of the potential invisible characters.
The output for the generated code and static string, assuming user input is the same as the static string (which is what I'm making sure to do):
http://www.example.com?param1=x&param2=y&param3=z
http://www.example.com?param1=x&param2=y&param3=z
I'm assuming printing the outputs removes characters I can't see.
I've also followed a solution at http://www.perlmonks.org/?node_id=882590 and it is pointing out differences, but I don't know why, considering I see none at all.
Has anyone run into this problem before? Please let me know if I need to clarify anything or need to provide additional information.
EDIT: Problem and Solution
So, after using mob's suggestion to identify differences, I found there was a null character in the generated URL that was not getting printed in the output. That is: http://www.example.com?param1=x&param2=y&param3=z was actually http://www.example.com?param1=x&param2=y&param3=\000z.
I used a simple regex: $url =~ s/\000//g; to remove that (and any other) null value.

Use a data serialization function to inspect your strings for hidden characters.
$url1 = "http://www.example.com?param1=x&param2=y";
$url2 = "http://www.example.com?param1=x&param2=y\0";
$url3 = "http://www.example.com?param1=x&param2=y\n";
use JSON;
print JSON->new->pretty(1)->encode( [$url1,$url2,$url3] );
# Result:
# [
# "http://www.example.com?param1=x&param2=y",
# "http://www.example.com?param1=x&param2=y\u0000",
# "http://www.example.com?param1=x&param2=y\n"
# ]
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper($url1,$url2,$url3);
# Result:
# $VAR1 = "http://www.example.com?param1=x&param2=y";
# $VAR2 = "http://www.example.com?param1=x&param2=y\0";
# $VAR3 = "http://www.example.com?param1=x&param2=y\n";

Clearly the string you have built is different from the hard-coded one. If you write code like this
my $ss = 'http://www.example.com?param1=x&param2=y&param3=z';
print join(' ', map " $_", $ss =~ /./g), "\n";
print join(' ', map sprintf('%02X', ord), $ss =~ /./g), "\n";
then you will be able to see the hex value of each character in the string, and you can compare the two of them more accurately. For instance, the code above outputs
h t t p : / / w w w . e x a m p l e . c o m ? p a r a m 1 = x & p a r a m 2 = y & p a r a m 3 = z
68 74 74 70 3A 2F 2F 77 77 77 2E 65 78 61 6D 70 6C 65 2E 63 6F 6D 3F 70 61 72 61 6D 31 3D 78 26 70 61 72 61 6D 32 3D 79 26 70 61 72 61 6D 33 3D 7A

Related

Converting surrogate pairs to emoji - python3

I found the solution to similar question on the other topic, but unfortunately it's not working for me.
Here is my problem:
I'm making dataframe from the surrogatepairs unicodes which I'd like to search for in another file (example: "\uD83C\uDFF3", "\u26F9", "\uD83C\uDDE6\uD83C\uDDE8"):
with open("unicodes.csv", "rt") as csvfile:
emoticons = pd.read_csv(csvfile, names=["xy"])
emoticons = pd.DataFrame(emoticons)
emoticons = emoticons.astype(str)
Next I'm reading my file with text where some lines contain surrogate pairs unicodes:
for chunk in pd.read_csv(path, names=["xy"], encoding="utf-8", chunksize=chunksize):
spam = pd.DataFrame(chunk)
spam = spam.astype(str)
In this for loop I'm checking if line contains surrogatepairs unicode, and if it's true, then I'd like to print this surrogatepair unicode as emoji - that's why I'm encoding and decoding this "i" value which is str:
(solution from: How to work with surrogate pairs in Python?)
for i in emoticons.xy:
if spam["xy"].str.contains(i, regex=False).any():
print(i.encode('utf-16', 'surrogatepass').decode('utf-16'))
#printing:
#\uD83C\uDFF3
#\u26F9
#\uD83C\uDDE6\uD83C\uDDE8
So, when I start the program it still prints surrogatepairs unicode as str, not as emoji, but when I input surrogatepair unicode into print function by myself, it works:
print("\uD83C\uDFF3".encode("utf-16", "surrogatepass").decode("utf-16", "surrogatepass"))
#printing:
#🏳
What am I doing wrong? I tried to make string from this i and another solutions, but it still doesn't work.
EDIT:
hexdump -C file.csv
00004b70 5c 75 44 38 33 44 5c 75 44 45 45 39 0a 5c 75 44 |\uD83D\uDEE9.\uD|
00004b80 38 33 44 5c 75 44 45 45 42 0a 5c 75 44 38 33 44 |83D\uDEEB.\uD83D|
00004b90 5c 75 44 45 45 43 0a 5c 75 44 38 33 44 5c 75 44 |\uDEEC.\uD83D\uD|
00004ba0 43 42 41 0a 5c 75 44 38 33 44 5c 75 44 45 38 31 |CBA.\uD83D\uDE81|
EDIT2:
So I've found something kind of working, but still need an improvement:
https://stackoverflow.com/a/54918256/4789281
Text from my another file which I want to convert looks file:
"O żółtku zapomniałaś \uD83D\uDE02"
"Piękny outfit \uD83D\uDE0D"
When I'm doing this what was recommended in another topic:
print(codecs.decode(i,encoding='unicode_escape',errors='surrogateescape').encode('utf-16', 'surrogatepass').decode('utf-16'))
I've got something like this:
O żóÅtku zapomniaÅaÅ 😂
PiÄkny outfit 😍
So my surrogatepairs are replaced, but my polish characters are replaced with something strange.
You are along the right track.
WHat you are trying to do breaks because what you have in your "str" after you read the file are not "surrogate pairs" - instead, they are backslash-encoded codepoints for your surrogate pairs, encoded as text.
That is: the sequence "5c 75 44 38 33 44" in your file are the ACTUAL ascii characters "\uD83D" (6 characters in total), not the surrogate codepoint 0xD83D (which, when properly decoded, along with the next surrogate "\uDE0D" will be a single character in your string).
The part I said you are on the right track is: you really have to encode this into a bytes-sequence, and then decode it back. What is wrong is that you have to encode it using "latin1" (just to try to preserve any other non-ascii char you have on the string- it may break if you have codepoints not representable in latin1), and decode it back with the special "unicode escape" codec. or a charmap encoding, that will preserve your other characters on the string, and then decode it back, using the same codec. At that point, both surrogate characters will be text as two characters in a Python string:
In [16]: "\\uD83D\\uDE0D".encode("latin1").decode("unicode escape", "surrogatepass")
Out[16]: '\ud83d\ude0d'
The bad news is - that is not a very valid STR - the surrogate characters should not exist by themselves in the internal representation - instead, they should be combined in to the desired final character. So, trying to print that out will break:
In [19]: a = "\\uD83D\\uDE0D".encode("utf-8").decode("unicode escape")
In [20]: print(a)
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-20-bca0e2660b9f> in <module>
----> 1 print(a)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
Using "surrogatepass" error policy here will be of no help - you will get an unprintable bytesequence.
Therefore, for a second time, this have to be "encoded" and "decoded" - this time, the characters you have in text are actual "surrogate" codepoints that would be valid utf-16 to be decoded. So, the path now is to encode this sequence, brute-forcing these chars with "surrogatepass", and decode then back from utf-16 - which will finally understand the surrogate pair as a single character:
In [30]: a = "\\uD83D\\uDE0D".encode("unicode escape").decode("unicode escape")
In [31]: a
Out[31]: '\ud83d\ude0d'
In [32]: b = a.encode("utf-16", "surrogatepass").decode("utf-16")
In [33]: print(b)
😍
Summarising:
You read your file as utf-8 text, to read possible other non-ascii characters,
encode the result as "unicode escape" and decode it back - this will convert the
extended human readable "\uXXXX" sequences in your file as the surrogate codepoints. Then you convert it back to utf-16, telling Python to ignore
surrogates and copy then "as is", and decode back from utf-16:
def decode_surrogate_ascii(txt):
interm = txt.encode("latin1").decode("unicode escape")
return interm.encode("utf-16", "surrogatepass").decode("utf-16")
All you have to do is to apply the above function in the columns of interest on your data frame:
emoticons = emoticons.apply(pd.Series(lambda row: (decode_surrogate_ascii(item) if isinstance(item, str) else item for item in row ))

How to interpret numbers with leading zero's as decimal?

all!
I need to interpret numbers given with leading zeros in python as decimal. ( on input there are numbers, not strings!)
Python2 is used, in python3 there are no more such problem.
I haven't ideas how to do this.
Anybody help me please!!!
example:
id = 0101
print id
# will print 65 and I need 101
id = 65
print id
# will print 65 - ok
possible solution:
id = 0101
id = oct(id).lstrip('0')
print id
# will print 101 - ok
id = 65
id = oct(id).lstrip('0')
print id
# will print 101 - wrong, need 65
It is a normal behaviour for Python2. This kind of numbers are specified in the language:
octinteger ::= "0" ("o" | "O") octdigit+ | "0" octdigit+
"0" octdigit+ - numbers that are started with "0" - are octal by design. You can't change this behaviour.
If you want to interpret 077 as 77, the most you can do is some kind of ugly transformations like it:
int(str(oct(077)).lstrip('0'))
Can you cast it to string?
For example:
def func(rawNumber):
id = str(rawNumber)
if id[0] == '0':
res = oct(id).lstrip('0')
else:
res = id
return int(res)
# then use it like this:
print(func(0101)) # will print 101
print(func(65)) # will print 65

Convert a string to Hex in VB

In VB, I'm reading a file line by line using IO.File.Readline() Method. Each line of the file contains a string similar to the following
":1A2C003F4EDCFE3A2F5D66\r\n"
Now for each line I read, All I want to do is
1. Remove the ":" and "\r\n" from the line
2. pair the values as bytes e.g:"1A 2C 00"... (Now the line would be "1A 2C 00 3F 4E DC FE 3A 2F 5D 66")
3. Add all the bytes together and to find the result is zero or not. e.g: (1A+2C+00+3F+4E+DC+FE+3A+2F+5D+66)=0?
How can I proceed?
So far I have done
While endofstream = False
stringReader = fileReader.ReadLine()
If stringReader.StartsWith(":") Then
stringReader = stringReader.Replace(vbCr, "")
stringReader = stringReader.Replace(":", "")
MsgBox(stringReader)
But be careful. Shouldn't you have parts of 4 characters? 1A2C 003F 4EDC ...
All you have to do is to convert hex to decimal numbers and sum them
Dim sum As Integer
For index As Integer = 0 To stringReader .Length-1 Step 2
' we take 2 chars
' we use ToInt32 method http://msdn.microsoft.com/en-us/library/f1cbtwff.aspx
sum += Convert.ToInt32(stringReader.Chars(index) & stringReader.Chars(index+1), 16)
Next
' use sum
In my case, the result is 985

BBC basic variables

Background info for the problem: I am writing a text adventure game where the player has multiple paths to choose at each intersection/ problem.
Problem: I am attempting to use a variable from another path, which may not of been called upon. Is there anyway to call this variable before or skip a line of code?
This is the section of my code I am talking about
38 input "What do you do? 'A' to continue, 'B' to run away" , BAB$
39 if BAB$ == "A" then
40 if BCP$ == "B" then
41 print "The hunters see you return"
42 print "When they ask if you found the prisoner, you respond by saying that you havent seen him"
43 print "The hunters decide that this venture isnt worth it, and decide to leave, taking you with them"
44 wait 30
45 print "You escape shortly after the rest of the group leaves the area"
46 print "You are now a free man"
47 wait 200
48 clear
49 cls
50 goto 100
51 else
52 goto 55
53 endif
Have any questions about my wording? Just ask!
The simplest answer to this question is to just initialise the variable at the start of the program:
BAB$ = ""
BCP$ = ""
That way, when you hit line 40, either BCP$ will have a value of "" or have a value of something else.

Adding a hex value to a string of bits

I have an issue with increasing a string of bits which represents an IPv4 address.
The string looks like this "E8 00 00 64"
What I'm trying to do is this: when adding a value aSourceAddress to this string, the last bit should be increased, i.e. when adding 5, the string should look like this "E8 00 00 69". However, when I add 6, I get "E8 00 00 70" and what I am hoping to get is "E8 00 00 6A".
Logically it's simple, I need to convert the aSourceAddress variable to hex and add it to 64, but my output is, again, 70.
So I guess what I'm asking is, how can I get a result in hex.
This is kinda what I have so far.
proc dec2hex {dec_num} {return [format %04X $dec_num]}
set lEndOfAddress {format 0x%x[expr { 0x64 + 0x[dec2hex $aSourceAddress] }]}
set lCompareIpAddr "E8 00 00"
append lCompareIpAddr " $lEndOfAddress"
First, I think you should take advantage of the # flag, which appends the 0x as necessary, instead of adding it on your own. Also, I'm not sure I understand the padding (the 4), but I'll leave that be:
proc dec2hex {dec_num} {return [format %0#4X $dec_num]}
I think your brackets and/or spacing was botched in your editing, but here's the next line, fixed:
set lEndOfAddress [format %02X [expr { 0x64 + [dec2hex $aSourceAddress] }]]
And simplifying your last line,
set lCompareIpAddr "E8 00 00 $lEndOfAddress"
I get the results,
% set aSourceAddress 6
5
% proc dec2hex {dec_num} {return [format %0#4X $dec_num]}
% set lEndOfAddress [format %02X [expr { 0x64 + [dec2hex $aSourceAddress] }]]
6A
% set lCompareIpAddr "E8 00 00 $lEndOfAddress"
E8 00 00 6A

Resources