AvalonEdit TextDocument crash reading large files

AvalonEdit TextDocument crash reading large files - sharpdevelop

I have an editor implemented using AvalonEdit, and when I open a new file I populate the Document with a new TextDocument, and populate that TextDocument with an IEnumerable<char>. That IEnumerable is just reading the file, char-by-char, from a stream.
We recently had it crash when opening a large file. Large meaning 600MB of plaintext. Don't ask, I think it's ridiculous, but it's necessary to be able to view files that big for our application. After a bunch of debugging, I found that the TextDocument constructor is crashing with an OutOfMemoryException, specicifically something where "Array dimensions exceeded supported range." Here's the relavent part of the stack trace:
System.OutOfMemoryException: Array dimensions exceeded supported range.
at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
at System.Linq.Enumerable.ToArray[TSource](IEnumerable`1 source)
at ICSharpCode.AvalonEdit.Utils.Rope`1.ToArray(IEnumerable`1 input)
at ICSharpCode.AvalonEdit.Utils.Rope`1..ctor(IEnumerable`1 input)
at ICSharpCode.AvalonEdit.Document.TextDocument..ctor(IEnumerable`1 initialText)
...
In my debugging I found that it happened consistently on the 134217728th character (2^27). That seems like an arbitrary limit, is there any reason that this would be a restriction? I'd expect the restriction of an array length to be int.MaxValue. It looks like it's something to do with the Rope creation, but I haven't been able to go into the Rope<T> source code yet.

Related

ADF copy activity failed while extracting data from DB2- Issue found for few records having special characters

I am doing a full extract from a table ABC. In copy activity, I have given a query as
select * from ABC
whereas I am facing issue for few rows (It has special characters - Japanese and Korean)
Error code 2200
Failure type User configuration issue
Details Failure happened on 'Source' side. ErrorCode=DB2DriverRunFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error thrown from driver. Sql code: '-343',Source=Microsoft.DataTransfer.ClientLibrary.Db2Connector,''Type=Microsoft.HostIntegration.DrdaClient.DrdaException,Message=HISMPCB0001 In BasePrimitiveConverter an exception has occurred. Exception description: Output buffer is smaller than required size 12 SQLSTATE=HY000 SQLCODE=-343,Source=Microsoft.HostIntegration.Drda.Requester,'
The character which is causing the issue is '轎ᆃ '

In the error description, it states that there is BasePrimitiveConverter exception that has occurred. The exception description indicates that the output buffer is smaller than the required size. So, please try converting the column to an acceptable type like graphic in db2. Refer to the following link to understand more.
https://bytes.com/topic/db2/answers/488983-storing-some-japanese-data
Referring to these links, I understand that this error might be due to the datatype of source column, or the encoding used. Try working with different encoding options available in your source dataset. Here is a similar problem with a different source but a similar problem of not being able to retrieve special characters.
https://learn.microsoft.com/en-us/answers/questions/467456/failure-happened-source-side-in-copy-activity-for.html

How Do I resolve "Illuminate\Queue\InvalidPayloadException: Unable to JSON encode payload. Error code: 5"

Trying out the queue system for a better user upload experience with Laravel-Excel.
.env was been changed from 'sync' to 'database' and migrations run. All the necessary use statements are in place yet the error above persists.
The exact error happens here:
Illuminate\Queue\Queue.php:97
$payload = json_encode($this->createPayloadArray($job, $queue, $data));
if (JSON_ERROR_NONE !== json_last_error()) {
throw new InvalidPayloadException(
If I drop ShouldQueue, the file imports perfectly in-session (large file so long wait period for user.)
I've read many stackoverflow, github etc comments on this but I don't have the technical skills to deep-dive to fix my particular situation (most of them speak of UTF-8 but I don't if that's an issue here; I changed the excel save format to UTF-8 but it didn't fix it.)
Ps. Whilst running the migration, I got the error:
SQLSTATE[42000]: Syntax error or access violation: 1071 Specified key was too long; max key length is 767 bytes (SQL: alter table `jobs` add index `jobs_queue_index`(`queue`))
I bypassed by dropping the 'add index'; so my jobs table is not indexed on queue but I don't feel this is the cause.

One thing you can do when looking into json_encode() errors is use the json_last_error_msg() function, which will give you a bit more of a readable error message.
In your case you're getting a '5' back, which is the JSON_ERROR_UTF8 error code. The error message back for this is a slightly more informative one:
'Malformed UTF-8 characters, possibly incorrectly encoded'
So we know it's encountering non-UTF-8 characters, even though you're saving the file specifically with UTF-8 encoding. At first glance you might think you need to convert the encoding yourself in code (like this answer), but in this case, I don't think that'll help. For Laravel-Excel, this seems to be a limitation of trying to queue-read .xls files - from the Laravel-Excel docs:
You currently cannot queue xls imports. PhpSpreadsheet's Xls reader contains some non-utf8 characters, which makes it impossible to queue.
In this case you might be stuck with a slow, non-queueable option, or need to convert your spreadsheet into a queueable format e.g. .csv.
The key length error on running the migration is unrelated. It has been around for a while and is a side-effect of using an older version of MySQL/MariaDB. Check out this answer and the Laravel documentation around index lengths - you need to add this to your AppServiceProvider::boot() method:
Schema::defaultStringLength(191);

Excel VBA out of memory when reading file to string on subsequent runs. Works the first time, but not the second

I'm working on a project where I'm harvesting data from large text files, some 200MB up to 1.5GB. At the moment I'm testing this with a 200MB sized file. If I use the File System Object and read the files line-by-line the process takes over an hour. So in an attempt to speed things up I found that reading the files into a long String then splitting on specific headers, then in turn splitting each index of that array by lines and iterating over line-by-line decreased the total read time to about a minute. Huge performance boost! Problem is the process only works the first time it is run. Any subsequent runs cause an 'Run-time error '14': out of string space' error. The fix is to close the workbook then reopen it. Then it will work again, but just once. To circumvent that issue I created an interim method that breaks down the original file into multiple files of roughly 50MB each. Then I iterate over each of the smaller files and in turn split those on headers, and so on... Now my interim process, the one that breaks down the original file, only works the first time. Here is the code:
entire_file = FreeFile ' .................................................. get the next FreeFile number
Open orig_path For Input As #entire_file ' ................................ open file for reading
contents = Input$(LOF(entire_file), entire_file) ' ........................ read entire file into a String
Close #entire_file ' ...................................................... close the input file, no longer needed
blocks = Split(contents, "PAGE 10000") ' .................................. split on the PAGE number
contents = "" ' ........................................................... free up memory
The error happens on the 'blocks = split" line, however, hovering the mouse over the variable 'contents' shows (Out of memory), so I'm not sure the problem is with the 'blocks=split' line.
I close every opened file. I clear all String variables. I set all Class objects to Nothing. Yet this memory issue still persists. I just cannot figure out a way to clear the memory so that the method can be run more than once. Did I miss any other pertinent data that might help diagnose this?
I found a few other posts where someone encountered the same issue, but there was no resolution to the problem, or at least the thread was closed before a solution was posted. Many thanks in advance.

J2ME - PNG created in PhotoShop not displaying in emulator

My midlet is showing some images fine, but not others.
They are all 8-bit PNGs, but the ones that aren't displaying are the ones I have created myself in PhotoShop.
So I am thinking maybe my PhotoShop (CS6) settings are wrong...
PNG-8, Selective, Diffusion, Colors: 256, Dither: 100%, Matte: None, Web
Snap: 0%, Convert to sRGB: ticked, Width: 48, Height: 48, Percent: 100%,
Quality: Bicubic.
I've experimented with a few of these settings, but to no avail.
Any ideas?
There is a similar problem here but this is opposite to mine in that PhotoShop mends things in that case, rather than breaks things...
My code is...
image = Image.createImage("/img/loading1.png");
...and here is my stack trace:
java.io.EOFException
at javax.imageio.stream.ImageInputStreamImpl.readFully(
ImageInputStreamImpl.java:353)
at java.io.DataInputStream.readUTF(DataInputStream.java:609)
at javax.imageio.stream.ImageInputStreamImpl.readUTF(ImageInputStreamImpl.java:332)
at com.sun.kvem.png.PNGImageReader.parse_iTXt_chunk(PNGImageReader.java:447)
at com.sun.kvem.png.PNGImageReader.readMetadata(PNGImageReader.java:650)
at com.sun.kvem.png.PNGImageReader.readImage(PNGImageReader.java:1312)
at com.sun.kvem.png.PNGImageReader.read(PNGImageReader.java:1582)
at com.sun.kvem.midp.GraphicsBridge.loadImage(GraphicsBridge.java:2602)
at com.sun.kvem.midp.GraphicsBridge.createImageFromData(GraphicsBridge.java:2511)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.sun.kvem.sublime.MethodExecution.process(MethodExecution.java:42)
at com.sun.kvem.sublime.SublimeExecutor.processRequest(SublimeExecutor.java:63)
at javax.microedition.lcdui.Image.createImage(Image.java:315)
The image in question does exist - both in the project and in the jar that is built.
Here is the image in question:

According to the crash log, the PNG decoder in J2ME fails inside the non-critical chunk iTXt:1
> com.sun.kvem.png.PNGImageReader.readMetadata
> com.sun.kvem.png.PNGImageReader.parse_iTXt_chunk
> javax.imageio.stream.ImageInputStreamImpl.readUTF
> java.io.DataInputStream.readUTF
According to libpng documentation, the text part of an iTXt chunk must be valid UTF8:
... The remaining chunk data is the main UTF-8 text, either zlib-compressed or not, according to the compression flag. Since its length can be determined from the chunk length, it is not null-terminated. As with the other two text chunks, newlines should be represented by single line-feed characters (decimal 10), and all other control characters (1-9, 11-31, and 127-159) are discouraged.
and so normally this would indicate that the stream read is not valid UTF8 text - it contains 'raw' bytes higher than the plain ASCII range 0..127 that do not conform to UTF8 rules.
I found that not to be the case in the sample image. There is only one set of consecutive bytes that form a UTF8 code sequence, and it is a valid one:
<?xpacket begin="EFBBBF" id=" ..
(the bolded section represents 3 data bytes in hexadecimal notation). I first suspected this was the error:
If the BOM character appears in the middle of a data stream, Unicode says it should be interpreted as a "zero-width non-breaking space" (inhibits line-breaking between word-glyphs). In Unicode 3.2, this usage is deprecated in favour of the "Word Joiner" character, U+2060.[1] This allows U+FEFF to be only used as a BOM.
(http://en.wikipedia.org/wiki/Byte_order_mark)
.. and so a fully conforming UTF8 reader should inspect its bytes and throw an UTFDataFormatException when it encounters a BOM anywhere else than as the very first value. Surprisingly, this does not seem to be the problem! First of all, there is no indication any of the readUTF sources do anything else than only verify if the UTF8 code is valid on its own, irrespective of its value. There are lots of 'invalid' Unicode code points (values that do not represent a valid Unicode character or instruction), but it appears to me they are all silently ignored. But I noticed the common readUTF functions only implement a small subset of UTF8/Unicode (see, e.g., Modified UTF-8 in Oracle's documentation).
So the problem lies elsewhere. Another clue to this is that the error thrown is not UTFDataFormatException but rather EOFException, indicating the read buffer ran out of the number of bytes it was promised to contain.
(warning: pure conjecture follows)
Looking at a source of DataInputStream, I find this snippet of code:
588 public final static String readUTF(DataInput in) throws IOException {
589 int utflen = in.readUnsignedShort();
followed by a loop to read utflen bytes (not "Unicode characters"). This is wrong for an iTXt chunk, as it does not have a 'first word' to indicate its length. The number of bytes in the plain text can be derived from the chunk length (which is, per PNG convention, the total data length excluding the length long word, the iTXt signature itself, and the final CRC32 code) minus the length of the zero-terminated keyword name, language, and "translated keyword" strings, and the two bytes which indicate compression of the full plain text.
As a work-around, remove the iTXt chunks from your PNG images. The data itself -- XMP Metadata -- is most likely not interesting at all for your purposes (but feel free to read what benefits Adobe thinks it has). And if your workflow does not use it, it's just a useless hunk of uncompressed text, taking up 814 bytes of the total of 981 bytes in your sample image -- a whopping 83%!
You can use an external utility to remove extraneous data chunks; the command line for the popular pngcrush, for example, is
pngcrush -rem alla -rem text InputFile.png OutputFile.png
(from en.wikipedia.org/wiki/Pngcrush).
Or directly from Photoshop: if you save a PNG 'the usual way' with the "Save As" menu option, the metadata goes in and there is no checkbox to get rid of it. If you use "Save for Web & Devices" instead, you get a large dialog with a lot of handy options, such as a drop down list labelled "Metadata".
Choosing "All" I got an even larger file; my version of Photoshop creates a massive 3K chunk of XMP Metadata, including a 2K totally empty 'filler' block...
Selecting "Copyright" or "None" finally got rid of all the crud (presumably because I did not fill in any copyright information), and then you get a nice 169 bytes long PNG, in which the only metadata is that software used is called "Adobe ImageReady".
1 Which is kind of ironic. Per PNG specifications,
.. A decoder encountering an unknown chunk in which the ancillary bit is 1 can safely ignore the chunk and proceed to display the image.
(source)
This "ancillary bit" is the 5th bit of the first byte of the chunk ID: 0 (uppercase) = critical, 1 (lowercase) = ancillary, i.e., if the first character of the chunk ID is a capital, a PNG reader must read and interpret its data correctly, and if it's not, it can be skipped silently.
So technically, the writers of J2ME could safely have ignored this entire chunk. But they messed it up, attempt to read it, and now the code crashes on all programs that merely try to read the image data in PNGs which happen to contain iTXt chunks.

Resetting textArea length JavaFX

I wrote server in C and client in Java. I used JavaFX for GUI. Everything works except that sometimes I get exceptions when textArea gets filled and receives more data before it gets reseted (probably cause of parallel threading). Actually there are 3 cases which occur "randomly":
1) Stucks/hangs and no exceptions are thrown.
2) NullPointerException (about Line Padding and Content Bounds [there's nowhere my code mentioned]).
3) IllegalArgumentException: Both width and height must be >= 0.
4) Exception about String text bounds.
Here's the code if it helps:
if(textArea.getLength() > 500) // I tried with > 2000, similar situations occur
textArea.setText("");
command = textField.getText();
out.println(command); // out to socket
textField.setText("");
Btw, this GUI should represent basic Linux shell, so textArea should sometimes be able to receive large amount of data (such as netstat command).
Thanks!

It is an exact dublicate of your previous question but with more info, so not going to vote for closing. I asked you to post exception stacktrace in that previous question, but you mentioned there is no your code related lines in stacktrace, hence I also presume like you that it is a bug of textArea. So I suggest to try to use another component, for example a big Label with white background :), if it is solely for displaying purposes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string