I am using charset Cp037 and able to display all the French chars for example : été à
But not able to display only ’ ( tilted apostrophe) please let me know the charset to be used.
I have tried ISO-8859-1 and didn't work.
Related
The Problem
On Google Flights, search information is encoded in a URL parameter, presumably so users can share flight searches with each other easily.
The URL format looks like this:
https://www.google.com/travel/flights/search?tfs=CBwQAhoeagcIARIDSE5MEgoyMDIxLTA5LTEzcgcIARIDU0ZPGh5qBwgBEgNTRk8SCjIwMjEtMDktMTdyBwgBEgNITkxwAYIBCwj___________8BQAFIAZgBAQ
I am trying to write a program that can generate flight search URLs given flight information (origin, destination, flight dates, passengers, etc). To do this I need know how the information is encoded in the URL so I can recreate it.
What I've tried
I know that the flight info is encoded in base64 or some variant of it (I've been using base64decode.org for testing). For a round-trip flight from HNL-SFO on 2021-09-13 - 2021-09-17, Google Flights has this URL:
https://www.google.com/travel/flights/search?tfs=CBwQAhoeagcIARIDSE5MEgoyMDIxLTA5LTEzcgcIARIDU0ZPGh5qBwgBEgNTRk8SCjIwMjEtMDktMTdyBwgBEgNITkxwAYIBCwj___________8BQAFIAZgBAQ
The part of the tfs query parameter before the underscores decodes to
jHNL
2021-09-13rSFOjSFO
2021-09-17rHNLp
which contains some (but not all) recognizable flight info. What I don't understand is the whitespace between the recognizable information. Using this site, I learned that the whitespace is a mix of characters:
U+0008 : <control> BACKSPACE [BS]
U+001C : <control> INFORMATION SEPARATOR FOUR {file separator (FS)}
U+0010 : <control> DATA LINK ESCAPE [DLE]
U+0002 : <control> START OF TEXT [STX]
U+001A : <control> SUBSTITUTE [SUB]
U+001E : <control> INFORMATION SEPARATOR TWO {record separator (RS)}
U+006A : LATIN SMALL LETTER J
U+0007 : <control> BELL [BEL]
U+0008 : <control> BACKSPACE [BS]
U+0001 : <control> START OF HEADING [SOH]
U+0012 : <control> DEVICE CONTROL TWO [DC2]
U+0003 : <control> END OF TEXT [ETX]
U+0048 : LATIN CAPITAL LETTER H
U+004E : LATIN CAPITAL LETTER N
U+004C : LATIN CAPITAL LETTER L
...
This suggests that I'm not decoding the data properly. I've tried some other variants of base64, but haven't had any luck.
Does anyone know how this info is encoded? Another thing I haven't been able to figure out is how the information after the underscores (8BQAFIAZgBAQ) is encoded. Based on the behavior of the Google Flights site, I think it encodes passenger information, but it base64 decodes to only whitespace characters.
Additional Context
Two years ago I made a working version of the program which produced URLs like
https://www.google.com/flights?hl=en#flt=ORD.MCO.2021-07-16*MCO.ORD.2021-07-19;c:USD;e:1;px:2,2,0,0;sd:1;t:f
Several months ago Google changed the format they use from the above to the encoded version. I want to figure out how to recreate the encoded URLs so I can update my program instead of retiring it.
You can have your program output flight URLs in query format using the q URL param. No need to encode/decode the URL.
For example:
https://www.google.com/travel/flights?q=Flights%20to%20SFO%20from%20HNL%20on%202022-09-13%20through%202022-09-17
Which leads to the results page: HNL <> SFO Flight Results
I miss having the ability to encode a query and have the same question. Nice work with finding out it's in base64.
I think reverse engineering is the only way to find out how things are encoded. For example, the stuff after the underlines is most likely binary-encoded.
See the below for economy:
11101111 10111111 10111101 00010100 00000000 00010100 11101111 10111111 10111101 00011001 11101111 10111111 10111101 00010000 00010000
And the same query but for for business class
11101111 10111111 10111101 00010100 00000000 00010100 11101111 10111111 10111101 00101001 11101111 10111111 10111101 00010000 00010000
As you can see the 10th byte goes from 00011001 to 10111101
I'm trying to display words with especial characters on groovy.
They are been replaced by "?" character on 2.5.3 version, but not using an older one like 1.5.7
Is a version bug?
Executing the same code with different groovy versions we get different results (the correct characters with the older and "?" with 2.5.3)
Running on RHL with JVM 1.8.0_161
def frase = "árbol è í ï Església Ramón"
println(frase);
byte[] testBytes = frase.getBytes("ISO-8859-1");
def frase1 = new String(testBytes, "ISO-8859-1")
println(frase1);
Expected output:
árbol è í ï Església Ramón
Real output:
?rbol ? ? ? Esgl?sia Ram?n
There was a double problem there:
The connection to the console via PUTTY needed to configure UTF8 for the console.
Files must be in UTF8 format (there were in ISO)
Thanks a lot.
i got this problem:
i got a message flow developed in WMB7 fix 6, for integrated with CICS. My CICS CCSID is 037. The broker is running in a z/Linux with locale = en_US.UTF-8 and locale charmap = UTF-8. The MQSeries is in 1208. I got problems with special characters like (ñ,Ñ, á etc etc)
In my message flow i got this code:
DECLARE CICSRespMsg BLOB;
DECLARE CICSRespChar CHARACTER;
DECLARE MsgOut BLOB;
DECLARE MsgOutChar CHARACTER;
--EBCDIC TO ASCII
SET CICSRespMsg = InputRoot.BLOB.BLOB;
SET CICSRespChar = CAST(CICSRespMsg AS CHARACTER CCSID 037);
SET MsgOut = CAST(CICSRespChar AS BLOB CCSID 850);
SET MsgOutChar = CAST(MsgOut AS CHARACTER CCSID 850);
I tried changing from 850 to 819 and i got the same issue. Hope you can help me. Thanks so much!. ;(
So I'm not allowed to ask for clarification in my "answer", so I'll show you how to debug your problem as I can't provide you with an exact solution with the information provided.
You've shown a snippet of ESQL which is converting from ibm-037 to ibm-850 via Unicode. As ibm-850 doesn't support ñ I would expect the conversion to fail. However ibm-819, a.k.a latin-1, a.k.a iso-8859-1 does support the character and the conversion of ñ should succeed.
I don't know what you're doing after the compute node, so look at your input and output nodes, and look at the CCSID in the Properties folder. You say the MQSeries is in 1208 which I assume you mean the queue managers default CCSID is set to 1208. If this is being used on the output node then you'll have a problem as utf-8 (ibm-1208) is incompatible with latin-1 for these characters.
Place a trace node after your input node and trace to a file with ${Root} as the trace expression, place another trace node before your output node tracing the same to a different file. Look at the bytes:
ñ in 037 is 0x49
ñ in 819 is 0xf1
ñ in 1208 is 0xc3b1
if you see 0x1a it's been replaced with a substitution character.
If you want the output to be UTF-8 ensure that you use 1208 instead of 850/819 above and make sure that OutputRoot.Properties.CodedCharSetId is set to 1208.
If you want the output to be in latin-1, use 819 above and ensure that OutputRoot.Properties.CodedCharSetId is set to 819.
Hope this helps,
Andreas
On my page i have set meta tag for description, which then use google. It's all normal. But from some reason google add space after one character. Here is my meta description:
Igraj brezplačne online igre! Izbiraj med več kot 6.000 igrami! Vsak dan dodajamo nove igre! Igraj zdaj!
Yes it's not in english, but that's normal :D
And her is how google shows it:
Igraj brezplač ne online igre! Izbiraj med več kot 6.000 igrami! Vsak dan dodajamo nove igre! Igraj zdaj!
The problem is first 'č' character google ads space after it. Check google description of it:
https://www.google.si/search?q=bringler&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
I have no idea why does it do it. And the funny part is that after second one all is ok, so no space added.
Any ideas?
Your content (the actual one in the meta-description, not the one in your question) contains a hidden control character: U+008D REVERSE LINE FEED
You can see it if you analyze the characters in the string, e.g. with Rishida’s String analyser: analyze "brezplačne"
If you copy the string directly from your meta-description and search for it in Google, it converts it to brezplaÄ Â ne.
So, replace the string "brezplačne" (note that Stack Overflow removes this hidden character, so these strings are actually the same here) in your content with "brezplačne" and you should be fine (when Google visits your page again, in some time).
inside a webapplication i am processing requests to a url like
http://example.com/<website-base-url>
im am logging the raw GET parameter of the request in an uft8 database column and in filesystem. for a few chinese domains i get requests with a website-base-url parameter like
%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%A7%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A3%C3%82%C2%A8%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A2%C3%82%C2%B4.cn
Decoding with urldecode returns
ã¥â¤â§ã¥â¤â´ã¨â´â´.cn
This does not seem to be the domain name the user wants to request.
I have tried urlencoding, base64, utf8 and combinations wihtout success.
Any suggestions how decode the given parameter to utf8?
URL percentage encodings simply encode the raw bytes. It does not give you any hint regarding the actual encoding of the text. If you do not know what encoding these bytes represent, all you can do is guess.
php > $d = urldecode('%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%A7%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A3%C3%82%C2%A8%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A2%C3%82%C2%B4.cn');
php > echo $d;
ã¥â¤â§ã¥â¤â´ã¨â´â´.cn
php > echo iconv('BIG5', 'UTF-8', $d);
php > echo iconv('Shift-JIS', 'UTF-8', $d);
テδ」テつ・テδ「テつ、テδ「テつァテδ」テつ・テδ「テつ、テδ「テつエテδ」テつィテδ「テつエテδ「テつエ.cn
php > echo iconv('GB18030', 'UTF-8', $d);
脙拢脗楼脙垄脗陇脙垄脗搂脙拢脗楼脙垄脗陇脙垄脗麓脙拢脗篓脙垄脗麓脙垄脗麓.cn
GB18030 would seem to be the best candidate, but even that decoded string looks a bit too repetitive to be really useful Chinese.