Why Notepad takes 25 seconds to open a 6MB text file
While Large Text File Viewer takes 10 seconds to open 600MB text file?
Most likely, Notepad is loading (or is attempting) to load the entire file into RAM memory, whereas LTFV is simply caching a certain portion of the file, but not the full content. You will also notice that for very large files (say 100MB or more), Notepad will tend to have a herky jerky performance as you scroll through it, while tools such as LTFV or Notepad++ will have smooth performance even for very large files.
Related
I am trying to convert a few large .doc and .rtf files to PDF on linux.
The command is:
unoconv -f pdf largefile.doc
My machine has 8 cores, however when i run htop, it shows that only 1 core is being used at 100%. And the conversion takes several minutes. I am wondering if there's a multithreaded/parallel version of the unoconv utility? Or libreoffice writer? Or perhaps another utility that can convert multiple file types into PDF.
Just to clarify, a single file takes a long time to convert, i'm not looking to convert several files in parallel.
UPDATE: one .doc file is taking 15 hours to convert. The file has about 30 pages and each page contains a screenshot. Perhaps I need another utility?
I opened a PDF file in vim which was about 4MB and it opened it at the blink of eye. I was amazed at the speed. So I tried the same in notepad and it crashed. I tried in notepad++, it took time.
Does anyone knows how vim handles this scenarios ? What makes reading files in vim so fast ?
Vim reads the whole file into an internal buffer on opening, so this all depends on your (virtual) memory and overall computer performance. Some disk editors only read in (viewed) parts of the file dynamically, and thereby obtain even greater performance.
On todays hardware, 4 MB is nothing. You can still drag down Vim with files of 100s of MB, especially if long lines or syntax highlighting is involved.
Other editors should not have a problem with 4 MB, neither. But PDF is not a text format, it's binary, so that might have confused some editors or taken them a long time to figure out what it's about.
The LargeFile plugin is worth mentioning here.
I make a j2me application that almost all of it, are text files.
size: 3mb
The problem is, when I run it on my mobile, it take about 10 sec to run. I do nothing on startup. I have another app with size: 7mb, but it runs without any delay!
Jar files link:
mine: http://s1.picofile.com/file/7252355799/mine.jar.html
correct one: http://s1.picofile.com/file/7252346448/correctone.jar.html
install both of them and run. mine take some seconds to show up, but the other shows immediately.
You need to take into account that JAR is a compressed file format.
To use jar file contents, device has to first de-compress it. How long does decompression take very much depends on jar contents and because of that, jar file size may be not directly related to startup delay.
You better use some zip tool (most if not all such tools can handle jar format) to learn about contents of the jar files you work with - this might give you better indication on what to expect at startup.
For example, I can easily imagine your "7 mb" jar file containing just a handful of jpeg images of total size, well, about same 7 mb - and decompressing very quickly.
As for "3 mb of text files" - if these decompress to something like few hundreds files of total size 50 mb then I would not wonder if it takes long to unpack at device startup.
I'm about to start on a project wherein I can foresee there being large files (mostly flat text files, but could be CSV, fixed-width, XML, ...so far) that need to be edited. I need to develop the pieces to do this editing within the application.
In trying to determine a Good Way to handle editing large amounts of data (possibly into the GB range) without having to load the whole thing, I found that Audacity is able to handle large files quite well. Audacity is open source, so I thought it would make an excellent teaching tool for me in this circumstance. However, I started thinking myself in circles going through the code and now I'm thoroughly confused.
I'm hoping for two results from this question:
A good way to handle this editing without loading the whole file. I thought about loading the data as they edited it, caching it on demand.
An explanation of how Audacity does it.
I'm using C# and .NET, but the answers don't need to be coupled to that environment.
Several tricks can make editing simpler and faster.
INDEX it for faster access. While the user is doing nothing, skim through the file and create an index so you can quickly find a specific spot in the file (see below).
Only store changes the user makes. Don't try to apply them directly to the file until the user saves.
Set a limit on how much to read into memory when the user jumps to a point. Read one or two screens of data initially so you can display it, and then if the user doesn't jump to a new spot immediately, read a bit before and a bit after the current spot.
Indexing:
When the user wants to jump to line X or timestamp T, you don't want to skim the whole file counting line breaks and characters. Skim the data, and create a record. Say, every 50 lines, record the byte offset, character count, and line number. This data can be stored in a hashtable, tree, or just an ordered list. Then when user jumps within the file, you can find the nearest index spot and read from there until you find the requested point. This technique is especially useful when working with Unicode, where the number of bytes per character may vary. If files are so large a full index won't fit in memory, you may want to limit the index points and space them more widely, or store the index in a temporary file.
Editing and altering big files:
As suggested by Harvey -- only store changes in memory (as diffs), and then apply them to the file when saved by streaming from input to output. A tree or ordered list may be helpful, so you can quickly find the next place where you need to make a change when writing from input to output.
If changes are too large to fit in memory, you may want to track them in a separate temporary file (perhaps in the same folder as the original). You can just keep writing a continuous list of changes, with new ones appended to this change file. When you save, you'll read through the change list and create a final list of alterations to apply, before deleting the temp file. For performance reasons, it may be helpful to avoid rewriting the change log file; instead, just append to the end of it, and remove redundant or cancelling edits when performing a save.
Interesting fact: the same structures you use for the change log can be used to provide Undo/Redo information.
Sound files are basically a data stream, right? So you don't actually need to deal with the whole file at once. Audacity users may only work with a small snippet of that large file at any given moment.
Hypothetically, if you are adding a 1 second snippet of sound to a large sound file, you only actually have to deal with the entire file when you have to save, at which point you splice together 3 parts: Before, 1 second snippet, and after. So the only thing that needs to actually be in memory is the 1 second snippet, and maybe a small portion of the sound before and after the snippet.
So when you save, you read, say 64 megabytes of file at a time (if you are really aggressive), and stream that out to a temporary file, until you get to your insertion point. Then you stream out the 1 second snippet, stream the remainder of the original file, close the temporary write file, delete the original file, and rename the new file to the original file name.
Of course, it's a little more complicated than this. There might be multiple edits before save, for example, and an undo buffer. But I can pretty much guarantee you that Audacity is limited in unsaved edit complexity by the amount of available RAM.
There are some standard tools to do this, but I need a simple GUI to assist some users (on windows). They will get an open file dialog and pick the file to process.
The file will be an XML file. The file will contain (within the first few lines) a text string that needs to be deleted or replaced with whitespace (doesn't matter which).
The problem is that the XML file is several gigabytes big but the fixed search and replace string will occur within the first 4k or so.
What's the best way to overwrite the search string and save in-place without requiring reading of whole amount into memory and or writing excessively to disk?
Obviously replacing with whitespace so the size of the file as a whole doesn't change is the best choice here, otherwise you must stream through the entire file to update in on disk.
If this was for a Unix environment, I would look into using mmap() to map a suitable part of the start of the file into RAM, then edit it in-place and be done.
This snippet shows how to use the Win32 equivalent, the CreateFileMapping() function.
You can easily write your own tool. If it is in the very beginning, then any brute-force approch will work. Just keep on scanning until you find it.
However avoiding a lot of disk writes is only possible if you do not change file size. If you wish to delete or insert bytes somewhere in the middle, you will have to overwrite all that follows them. Which in your case would be practically all of the file. So you'll have to replace it with whitespace. As long as you just replace one byte with another, there will be no overhead.