Should AspBufferLimit ever need to be increased from the default of 4 MB? - iis

A fellow developer recently requested that the AspBufferLimit in IIS 6 be increased from the default value of 4 MB to around 200 MB for streaming larger ZIP files.
Having left the Classic ASP world some time ago, I was scratching my head as to why you'd want to buffer a BinaryWrite and simply suggested setting Response.Buffer = false. But is there any case where you'd really need to make it 50x the default size?
Obviously, memory consumption would be the biggest worry. Are there other concerns with changing this default setting?

Increasing the buffer like that is a supremely bad idea. You would allow every visitor to your site to use up to that amount of ram. If your BinaryWrite/Response.Buffer=false solution doesn't appease him, you could also suggest that he call Response.Flush() now and then. Either would be preferable to increasing the buffer size.
In fact, unless you have a very good reason you shouldn't even pass this through the asp processor. Write it to a special place on disk set aside for such things and redirect there instead.

One of the downsides of turning off the buffer (you could use Flush but I really don't get why you'd do that in this scenario) is that the Client doesn't learn what the Content length at the start of the download. Hence the browsers dialog at the other end is less meaningfull, it can't tell how much progress has been made.
A better (IMO) alternative is to write the desired content to a temporary file (perhaps using GUID for the file name) then sending a Redirect to the client pointing at this temporary file.
There are a number of reasons why this approach is better:-
The client gets good progress info in the save dialog or application receiving the data
Some applications can make good use of byte range fetches which only work well when the server is delivering "static" content.
The temporary file can be re-used to satisify requests from other clients
There are a number of downside though:-
If takes sometime to create the file content, writing to a temporary file can therefore leave some latency before data is received and increasing the download time.
If strong security is needed on the content having a static file lying around may be a concern although the use of a random GUID filename mitigates that somewhat
There is need for some housekeeping on old temporary files.

Related

Can file size be used to detect a partial append?

I'm thinking about ways for my application to detect a partially-written record after a program or OS crash. Since records are only ever appended to a file (never overwritten), is a crash while writing guaranteed to yield a file size that is shorter than it should be? Is this guaranteed even if the file was opened in read-write mode instead of append mode, so long as writes are always at the end of the file? This would greatly simplify crash recovery, since comparing the last record's expected size and position with the actual file size would be enough to detect a partial write.
I understand that random-access writes can be reordered by the filesystem, but I'm having trouble finding information on whether this can happen when appending. I imagine an out-of-order append would require the filesystem to create a "hole" at the tail of the (sparse) file, write blocks beyond the hole, and then fill in the blocks in between, but I'm hoping that such an approach would be so inefficient that nobody would ever implement their filesystem that way.
I suppose another problem might be a filesystem updating the directory entry's file size field before appending the new blocks to to the file, and the OS crashing in between. Does this ever happen in practice? (ext4, perhaps?) Is there a quick way to detect it? (And what happens when trying to read the unwritten blocks that should exist according to the file's size?)
Is there anything else, such as write reordering performed by a disk/flash drive, that would get in the way of using file size as a way to detect a partial append? I don't expect to be able to compensate for this sort of drive trickery in my application, but it would be good to know about.
If you want to be SURE that you're never going to lose records, you need a consistent journaling or transactional system for your files.
There is absolutely no guarantee that a write will have been fulfilled unless you either set O_DIRECT [which you probably do not want to do], or you use markers to indicate aht "this has been fully committed", that are only written when the file is closed. You can either do that in the mainfile, or, for example, have a file that records, externally, "last written record". If you open & close that file, it should be safe as long as the APP is what is crashing - if the OS crashes [or is otherwise abruptly stopped - e.g. power cut, disk unplugged, etc], all bets are off.
Write reordering and write caching is/can be done at all levels - the C library, the OS, the filesystem module and the hard disk/controller itself are all ABLE to reorder writes.

How to open and read 1000s of files very quickly

My problem is that application takes too long to load thousands of files. Yes, I know it's going to take a long time, but I would like to make it faster by any amount of time. What I mean by "load" is open the file to get its descriptor and then read the first 100 bytes or so of it.
So, my main strategy has been to create a second thread that will open and close (without reading any contents) all the files. This seems to help because the thread runs ahead of the main thread and I'm guessing the OS is caching these file descriptors ahead of time so that when my main thread opens them it's a quick open. This has actually helped because the thread can start caching these file descriptors while my main thread is parsing the data read in from these files.
So my real question is...what else can I do to make this faster? What approaches are there? Has anyone had success doing this?
I've heard of OS prefetching calls but it was for virtual memory pages. Is there a way to tell the OS, hey I'm going to be needed all these files pretty soon - I suggest that you start gathering them for me ahead of time. My lookahead thread is pretty crude.
Are there low level disk techniques I could use? Is there possibly a pattern of file access that would help? Right now, the files that are loaded all come from the same folder. I suppose there is no way to determine where exactly on disk they lie and which ordering of file opens would be fastest for the disk. I'm also guessing that the disk has some hard ware to make this as efficient as possible too.
My application is mainly for windows, but unix suggestions would help as well.
I am programming in C++ if that makes a difference.
Thanks,
-julian
My first thought is that this is going to be hard to work around from a programmatic level.
You'll find Linux and OSX can access thousands of files like this in a fraction of the time it takes Windows. I don't know how much control you have over the machine. If you can keep the thousands of files on a FAT partition, you should see better results than with NTFS.
How often are you scanning these files and how often are they changing. If the ratio is heavily on the reading side, it would make sense to copy the start of each file into a cache. The cache could store the filename, modification time, and 100 bytes of each of the thousand files.

Does the Linux filesystem cache files efficiently?

I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.
Yes, if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.
As people have mentioned, mmap is a good solution here.
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.
The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.
Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.
I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.

For a peer-to-peer app that can resume file transfers, is it sufficient to check filesize/modified date for changes before resuming a file?

I'm working on a networked application that has a peer-to-peer file transfer component (think instant messenger), and I'd like to make it able to resume file transfers gracefully.
If there is an ongoing file transfer, and one user drops out, the recipient still knows how much of the file he's successfully received and therefore where to resume the transfer from. However, if the file has changed in the meantime, how can this be detected? With regards to my questions, I'm not focused here on corruption by the network so much as corruption by the source file being altered.
The way I was starting out on this was by having the sender hash the file before sending it, so the recipient has a hash to check the finished file against. However, this only detects corruption at the very end, unless each resume also hashes. This problem could be alleviated by viewing the file in chunks, and hashing each of those. However, the bigger problem with hashing is that it can take a really, really long time, which is just a bad user experience when a user just wants to immediately send something (Ex: Linux ISO on a slow network share is the file to be sent).
I was thinking about changing to simply checking the file size and modified date each time a transfer begins or is resumed. While this is clearly not foolproof, unless I'm missing something (and please correct me if I am), almost every means an end-user would be using to alter files will be well-behaved and at the very least mark the modified date, and even if not, the change in size should catch 99% of cases. Does this seem like an acceptable compromise? Bad idea?
How do the established protocols handle this?
The quick answer to your question is that it will work in most cases, unless files are modified often.
Instead of hashes, use check sums (CRC32 for example). These are much faster to check whether a file has been modified.
If a connection breaks, you only need to send the computed chunk checksums back to the source which can compute whether the current chunks have been modified in between. Then, it can decide which one to resend and send the missing chunks.
Chunk & checksums are the best trade-off over full files and hashes regarding user experience.

JavaME - LWUIT images eat up all the memory

I'm writing a MIDlet using LWUIT and images seem to eat up incredible amounts of memory. All the images I use are PNGs and are packed inside the JAR file. I load them using the standard Image.createImage(URL) method. The application has a number of forms and each has a couple of labels an buttons, however I am fairly certain that only the active form is kept in memory (I know it isn't very trustworthy, but Runtime.freeMemory() seems to confirm this).
The application has worked well in 240x320 resolution, but moving it to 480x640 and using appropriately larger images for UI started causing out of memory errors to show up. What the application does, among other things, is download remote images. The application seems to work fine until it gets to this point. After downloading a couple of PNGs and returning to the main menu, the out of memory error is encountered. Naturally, I looked into the amount of memory the main menu uses and it was pretty shocking. It's just two labels with images and four buttons. Each button has three images used for style.setIcon, setPressedIcon and setRolloverIcon. Images range in size from 15 to 25KB but removing two of the three images used for every button (so 8 images in total), Runtime.freeMemory() showed a stunning 1MB decrease in memory usage.
The way I see it, I either have a whole lot of memory leaks (which I don't think I do, but memory leaks aren't exactly known to be easily tracked down), I am doing something terribly wrong with image handling or there's really no problem involved and I just need to scale down.
If anyone has any insight to offer, I would greatly appreciate it.
Mobile devices are usually very low on memory. So you have to use some tricks to conserve and use memory.
We had the same problem at a project of ours and we solved it like this.
for downloaded images:
Make a cache where you put your images. If you need an image, check if it is in the cachemap, if it isn't download it and put it there, if it is, use it. if memory is full, remove the oldest image in the cachemap and try again.
for other resource images:
keep them in memory only for as long as you can see them, if you can't see them, break the reference and the gc will do the cleanup for you.
Hope this helps.
There are a few things that might be happening here:
You might have seen the memory used before garbage collection, which doesn't correspond to the actual memory used by your app.
Some third party code you are running might be pooling some internal datastructures to minimize allocation. While pooling is a viable strategy, sometimes it does look like a leak. In that case, look if there is API to 'close' or 'dispose' the objects you don't need.
Finally, you might really have a leak. In this case you need to get more details on what's going on in the emulator VM (though keep in mind that it is not necessarily the same as the phone VM).
Make sure that your emulator uses JRE 1.6 as backing JVM. If you need it to use the runtime libraries from erlyer JDK, use -Xbootclasspath:<path-to-rt.jar>.
Then, after your application gets in the state you want to see, do %JAVA_HOME%\bin\jmap -dump:format=b,file=heap.bin <pid> (if you don't know the id of your process, use jps)
Now you've got a dump of the JVM heap. You can analyze it with jhat (comes with the JDK, a bit difficult to use) or some third party profilers (my preference is YourKit - it's commercial, but they have time-limited eval licenses)
I had a similar problem with LWUIT at Java DTV. Did you try flushing the images when you don't need them anymore (getAWTImage().flush())?
Use EncodedImage and resource files when possible (resource files use EncodedImage by default. Read the javadoc for such. Other comments are also correct that you need to actually observe the amount of memory, even high RAM Android/iOS devices run out of memory pretty fast with multiple images.
Avoid scaling which effectively eliminates the EncodedImage.
Did you think of the fact, that maybe loading the same image from JAR, many times, is causing many separate image objects (with identical contents) to be created instead of reusing one instance per-individual-image? This is my first guess.

Resources