Ghost blog RSS feed broken: Input is not proper UTF-8

Ghost blog RSS feed broken: Input is not proper UTF-8 - node.js

When accessing /rss I see an error:
This page contains the following errors:
error on line 6 at column 575: Input is not proper UTF-8, indicate encoding !
Bytes: 0x08 0x75 0x74 0x20
I checked the settings in index.js and they seem to be right. Is this a template issue or something that is related to Ghost's core?
Thanks
Issue resolved! There was some invisible character that was hidden in one of the blog posts that destroyed the feed. Go find it!

Related

Linux using command file -i return wrong value charset=unknow-8bit for a windows-1252 encoded file

Using nodejs and iconv-lite to create a http response file in xml with charset windows-1252, the file -i command cannot identify it as windows-1252.
Server side:
r.header('Content-Disposition', 'attachment; filename=teste.xml');
r.header('Content-Type', 'text/xml; charset=iso8859-1');
r.write(ICONVLITE.encode(`<?xml version="1.0" encoding="windows-1252"?><x>€Àáção</x>`, "win1252")); //euro symbol and portuguese accentuated vogals
r.end();
The browser donwloads the file and then i check it in Ubuntu 20.04 LTS:
file -i teste.xml
/tmp/teste.xml: text/xml; charset=unknown-8bit
When i use gedit to open it, the accentuated vogal appear fine but the euro symbol it does not (all characters from 128 to 159 get messed up).
I checked in a windows 10 vm and in there all goes well. Both in Windows and Linux web browsers, it also shows all fine.
So, is it a problem in file command? How to check the right charsert of a file in Linux?
Thank you
EDIT
The result file can be get here
2nd EDIT
I found one error! The code line:
r.header('Content-Type', 'text/xml; charset=iso8859-1');
must be:
r.header('Content-Type', 'text/xml; charset=Windows-1252');

It's important to understand what a character encoding is and isn't.
A text file is actually just a stream of bits; or, since we've mostly agreed that there are 8 bits in a byte, a stream of bytes. A character encoding is a lookup table (and sometimes a more complicated algorithm) for deciding what characters to show to a human for that stream of bytes.
For instance, the character "€" encoded in Windows-1252 is the string of bits 10000000. That same string of bits will mean other things in other encodings - most encodings assign some meaning to all 256 possible bytes.
If a piece of software knows that the file is supposed to be read as Windows-1252, it can look up a mapping for that encoding and show you a "€". This is how browsers are displaying the right thing: you've told them in the Content-Type header to use the Windows-1252 lookup table.
Once you save the file to disk, that "Windows-1252" label form the Content-Type header isn't stored anywhere. So any program looking at that file can see that it contains the string of bits 10000000 but it doesn't know what mapping table to look that up in. Nothing you do in the HTTP headers is going to change that - none of those are going to affect how it's saved on disk.
In this particular case the "file" command could look at the "encoding" marker inside the XML document, and find the "windows-1252" there. My guess is that it simply doesn't have that functionality. So instead it uses its general logic for guessing an encoding: it's probably something ASCII-compatible, because it starts with the bytes that spell <?xml in ASCII; but it's not ASCII itself, because it has bytes outside the range 00000000 to 01111111; anything beyond that is hard to guess, so output "unknown-8bit".

How Do I resolve "Illuminate\Queue\InvalidPayloadException: Unable to JSON encode payload. Error code: 5"

Trying out the queue system for a better user upload experience with Laravel-Excel.
.env was been changed from 'sync' to 'database' and migrations run. All the necessary use statements are in place yet the error above persists.
The exact error happens here:
Illuminate\Queue\Queue.php:97
$payload = json_encode($this->createPayloadArray($job, $queue, $data));
if (JSON_ERROR_NONE !== json_last_error()) {
throw new InvalidPayloadException(
If I drop ShouldQueue, the file imports perfectly in-session (large file so long wait period for user.)
I've read many stackoverflow, github etc comments on this but I don't have the technical skills to deep-dive to fix my particular situation (most of them speak of UTF-8 but I don't if that's an issue here; I changed the excel save format to UTF-8 but it didn't fix it.)
Ps. Whilst running the migration, I got the error:
SQLSTATE[42000]: Syntax error or access violation: 1071 Specified key was too long; max key length is 767 bytes (SQL: alter table `jobs` add index `jobs_queue_index`(`queue`))
I bypassed by dropping the 'add index'; so my jobs table is not indexed on queue but I don't feel this is the cause.

One thing you can do when looking into json_encode() errors is use the json_last_error_msg() function, which will give you a bit more of a readable error message.
In your case you're getting a '5' back, which is the JSON_ERROR_UTF8 error code. The error message back for this is a slightly more informative one:
'Malformed UTF-8 characters, possibly incorrectly encoded'
So we know it's encountering non-UTF-8 characters, even though you're saving the file specifically with UTF-8 encoding. At first glance you might think you need to convert the encoding yourself in code (like this answer), but in this case, I don't think that'll help. For Laravel-Excel, this seems to be a limitation of trying to queue-read .xls files - from the Laravel-Excel docs:
You currently cannot queue xls imports. PhpSpreadsheet's Xls reader contains some non-utf8 characters, which makes it impossible to queue.
In this case you might be stuck with a slow, non-queueable option, or need to convert your spreadsheet into a queueable format e.g. .csv.
The key length error on running the migration is unrelated. It has been around for a while and is a side-effect of using an older version of MySQL/MariaDB. Check out this answer and the Laravel documentation around index lengths - you need to add this to your AppServiceProvider::boot() method:
Schema::defaultStringLength(191);

Include binary file in as86/bin86

I have written a bit of code in i8086 assembler that is supposed to put a 80x25 image into the VRAM and show it on screen.
entry start
start:
mov di,#0xb800 ; Point ES:DI at VRAM
mov es,di
mov di,#0x0000
mov si,#image ; And DS:SI at Image
mov cx,#0x03e8 ; Image is 1000 bytes
mov bl,#0x20 ; Print spaces
; How BX is used:
; |XXXX XXXX XXXXXXXX|
; ^^^^^^^^^ BL contains ascii whitespace
; ^^^^ BH higher 4 bits contain background color
; ^^^^ BH lower 4 bits contain unused foreground color
img_loop:
seg ds ; Load color
mov bh,[si]
seg es ; Write a whitespace and color to VRAM
mov [di],bx
add di,#2 ; Advance one 'pixel'
sal bh,#4 ; Shift the unused lower 4-bits so that they become background color for the 2nd pixel
seg es
mov [di],bx
add di,#2
add si,#1
sub cx,#1 ; Repeat until 1 KiB is read
jnz img_loop
endless:
jmp endless
image:
GET splash.bin
The problem is that I cannot get the as86 assembler to include the binary data from the image file. I have looked at the the man page but I could not find anything that works.
If I try to build above code it gives me no error, but the output file produced by the linker is only 44 bytes in size, so clearly it did not bother to put in the 1000 byte image.
Can anybody help me with that? What am I doing wrong?

I am not certain that this will help you, as I have never tried it for 8086 code. But you might be able to make it work.
The objcopy program can convert binary objects to various different formats. Like this example from the man objcopy page:
objcopy -I binary -O <output_format> -B <architecture> \
--rename-section .data=.rodata,alloc,load,readonly,data,contents \
<input_binary_file> <output_object_file>
So from that you'd have an object file with your <input_binary_file> in a section named .rodata. But you could name it whatever you wanted. Then use a linker to link your machine code to the image data.
The symbol names are created for you too. Also from the man page:
-B
--binary-architecture=bfdarch
Useful when transforming a architecture-less input file into an object file. In this case the output architecture can be set to
bfdarch. This option will be ignored if the input file has a known
bfdarch. You can access this binary data inside a program by
referencing the special symbols that are created by the conversion
process. These symbols are called _binary_objfile_start,
_binary_objfile_end and _binary_objfile_size. e.g. you can transform a picture file into an object file and then access it in your code using
these symbols.

If your whole code is pure code (no executable headers, no relocation...) you can just manually concatenate the image at the end of the code (and of course remove GET splash.bin). In Linux for example you can do cat code-binary image-binary > final-binary.

Thank you everybody else trying to help me. Unfortunately I did not get the objcopy to work (maybe I am just too stupid, who knows) and while I actually used cat at first, I had to include multiple binary files soon, which should still be accessible via labels in my assembler code, so that was not a solution either.
What I ended up doing was the following: You reserve the exact amount of bytes in your assembler source code directly after the label you wanna put in your binary file, i.e.:
splash_img:
.SPACE 1000
snake_pit:
.SPACE 2000
Then you assemble your source code creating a symbol table by adding the -s option, i.e. -s snake.symbol to your call to as86. The linker call does not change. Now you have a binary file that has a bunch of zeroes at the position you wanna have your binary data, and you have a symbol table that should look similar to this:
0 00000762 ---R- snake_pit
0 0000037A ---R- splash_img
All you gotta do now is get a program to override the binary file created by the linker with your binary include file starting at the addresses found in the symbol table. It is up to you how you wanna do it, there are a lot of ways, I ended up writing a small C program that does this.
Then I just call ./as86_binin snake snake.symbols splash_img splash.bin and it copies the binary include into my linked assembler program.
I am sorry for answering my own question now, but I felt like this is the best way to do it. It is quite unfortunate bin86 doesn't have a simple binary include macro on its own. If anybody else runs into this problem in the future, I hope this will help you.

problems with CICS Request Node CCSID

i got this problem:
i got a message flow developed in WMB7 fix 6, for integrated with CICS. My CICS CCSID is 037. The broker is running in a z/Linux with locale = en_US.UTF-8 and locale charmap = UTF-8. The MQSeries is in 1208. I got problems with special characters like (ñ,Ñ, á etc etc)
In my message flow i got this code:
DECLARE CICSRespMsg BLOB;
DECLARE CICSRespChar CHARACTER;
DECLARE MsgOut BLOB;
DECLARE MsgOutChar CHARACTER;
--EBCDIC TO ASCII
SET CICSRespMsg = InputRoot.BLOB.BLOB;
SET CICSRespChar = CAST(CICSRespMsg AS CHARACTER CCSID 037);
SET MsgOut = CAST(CICSRespChar AS BLOB CCSID 850);
SET MsgOutChar = CAST(MsgOut AS CHARACTER CCSID 850);
I tried changing from 850 to 819 and i got the same issue. Hope you can help me. Thanks so much!. ;(

So I'm not allowed to ask for clarification in my "answer", so I'll show you how to debug your problem as I can't provide you with an exact solution with the information provided.
You've shown a snippet of ESQL which is converting from ibm-037 to ibm-850 via Unicode. As ibm-850 doesn't support ñ I would expect the conversion to fail. However ibm-819, a.k.a latin-1, a.k.a iso-8859-1 does support the character and the conversion of ñ should succeed.
I don't know what you're doing after the compute node, so look at your input and output nodes, and look at the CCSID in the Properties folder. You say the MQSeries is in 1208 which I assume you mean the queue managers default CCSID is set to 1208. If this is being used on the output node then you'll have a problem as utf-8 (ibm-1208) is incompatible with latin-1 for these characters.
Place a trace node after your input node and trace to a file with ${Root} as the trace expression, place another trace node before your output node tracing the same to a different file. Look at the bytes:
ñ in 037 is 0x49
ñ in 819 is 0xf1
ñ in 1208 is 0xc3b1
if you see 0x1a it's been replaced with a substitution character.
If you want the output to be UTF-8 ensure that you use 1208 instead of 850/819 above and make sure that OutputRoot.Properties.CodedCharSetId is set to 1208.
If you want the output to be in latin-1, use 819 above and ensure that OutputRoot.Properties.CodedCharSetId is set to 819.
Hope this helps,
Andreas

The XML parser detected error code 302

I am using the XML-INTO op-code to parse a web service request. Every now and then I get errors in the logs
(RNX0351 - "The XML parser detected error code 302").
The help for a 302 is
302 The parser does not support the requested CCSID value or
the first character of the XML document was not '<'
To the best of my knowledge, the first character is "<" and the request is generated from a previous web service call so I would be very suprised if the CCSID has changed.
The error is repeatable, for the specific query so it is almost certainly data related, I am just unsure how I would go about identifying the offending item.
Any thoughts on how to determine the issue, or better yet, how to overcome it?
cheers

CCSID is an AS400/iSeries/Power System attribute, and it applies to the whole IFS.It's like a declaration of what inside the file is, or in other words what its internal encoding "should be".
It's supposed that data content encoding in the file and the file one (the envelope) match, and the box uses this attribute to show and handle corresponding characters.
It sounds like you receive data under one encoding, but CCSID file doesn't match.
Try changing CCSID on your file (only the envelope). E.G.: 37 (american), 500 (latin-1), 819 (utf-8), 850 (dos), 1252 (win) and display file after.You can check first using ls -Sla yourfile in QSH or QP2TERM, or EDTF as well. CHGATTR allows you to change CCSID, as well as setccsid in QSH (again).
This way helped me to find related issues. Remember that although data may be visible in the four hundred, they may not be visible through a share folder in Win. It means that CCSID file, an content encoding don't match.
Hope it helps.

Hi I've seen this error with XML data uploaded to AS400/iSeries/IBM i with FTP and the CCSID 819 (ISO 8859-1 ASCII) and it has some binary garbage in first few positions of file. Changed encoding to CCSID 1208 (UTF-8 with IBM PUA) using FTP "quote type c 1208" and the problem cleared and XML-INTO was successful.
So, suggestion about XML parser error 302 received when using XML-INTO is to look at the file (wrklnk ...) and if first character is not "<" but instead some binary garbage then try CCSID 1208 for utf-8.
Statements in this answer about what 819 is and what ccsid represents utf-8 do not agree with previous answer but are correct, according to IBM documentation:
https://www-01.ibm.com/software/globalization/ccsid/ccsid819.html
https://www-01.ibm.com/software/globalization/ccsid/ccsid1208.html

I'm working on this problem a couple hours,
for me the solution was use option ccsid=UCS2 when you use data structure or variable to store xml.
something like that :
XML-INTO customer %XML( xmlSource : 'ccsid=UCS2');
I have the program running on ccsid = 870, every conversion to ccsid on the xmlSource field don't work,
The strange thing that when I use the file with ccsid = 850, every thing work fine
I mention that becouse this is the first page when you looking about this problem.
Maybe this help someone.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string