aksing a statement in "BASIC" language - basic

I don't know if any one have experience in programming in BASIC language. I am reading a manual regarding a device which used enhanced BASIC language. I have no experience at all. The statement is like
OUTPUT 621 USING "#, K, 1024(W)";
I wonder what's the using statement is for? what's '#', 'K' and 1024(W) really mean? Sorry that the manual is so old and some of the pages lost and I can't even tell more information from the context.

In BASIC, USING statement was typically used for output formatting. So you can read that as "Output number 621 using formatting "#, K, 1024(W)"
What that formatting means, I think that's totally dependent on the BASIC dialect, though. You have to consult it's reference manual. # means "number", for sure, and the rest probably specifies how that number should be formatted.
Example of PRINT USING in TrueBasic manual (PDF, found by google).
As a totally wild guess, it could mean, use suffix K after dividing by 1024 and rounding as specified by (W). If this is so, then number 621 is probably number of bytes, and output is wanted in kilobytes.

It looks to me like this is a statement to write output to some type of external storage, any of the myriad types of tapes and disks that existed 35 or 40 years ago. Before things became more standardized with the advent of operating systems like CP/M and MS-DOS, there were hundreds (I'm guessing at the number) of companies building and marketing computers with their own proprietary operating systems. Each one would have its own commands and syntax for reading and writing to peripherals (as any storage outside the RAM was called in those days).
621 probably is the code for the particular tape drive, disk pack or floppy disk that they wanted to write output to. K is probably just a parameter for an option of some sort. I'm pretty sure that 1024(W) refers to the length in bytes to be allocated on the disk or tape for each instance that is written, and I'm even more certain that (W) means to access the device in write-only mode.

Related

A new idea on how to beat the 32,767 text limit in Excel

So as many others have asked in the past is there a way to beat the 32k limit per cell in Excel?
I have found ways to do it by splitting the work load into two different .txt files and then merging the two .txt files, however it is a giant PITA and more often then not I end up only using excel to its limits as I do not have time to validate the data after .txt file merges anymore this is a long process and tedious IMO.
However I think that if the limitation is there it is there because it was coded when Microsoft developed Excel, and since they have yet to raise it (2013 version the limit is still the same limit so it would do no good to upgrade)
I also know that many will say if you have a need for information in a single cell in that length then you should use ACCESS well I have no idea how to use ACCESS or how to import a tab delimited file into ACCESS like you would into EXCEL, and then even if I could figure that out I still now have to figure out how to learn all the new commands and he EXCEL equivalents if there is even such a thing.
So I was browsing some blog posts the other day on how to beat limitations by software and I read something about reverse engineering.
Would it be possible to load excel into a hex editor, go in and change every instance of 32767 to something greater?
While 32767 may seem like an arbitrary number, it's actually the upper limit of a 16-bit signed integer (called a short in C). The range of a short goes from -32768 to 32767.
A 16-bit integer can also be unsigned, in which case its range is 0 to 65535.
Since it's impossible for a cell to have a negative number of characters, it seems odd that Microsoft would limit a cell's length based on a signed rather than unsigned 16-bit integer. When they wrote the original program, they probably couldn't imagine anyone storing so much information in a single cell. Using shorts may have simplified the code. (My first computer had only 4K of memory, so it's still amazing to me that Excel can store 8 times that much information in a single cell.)
Microsoft may have kept the 32767 limit to maintain backward compatibility with previous versions of Excel. However, that doesn't really make sense, because the row and column counts greatly increased in recent versions of Excel, making large spreadsheets incompatible with previous versions.
Now to your question of reverse-engineering Excel. It would be a gargantuan task, but not impossible. In the early '90s, I reverse-engineered and wrote vaccines for a few small computer viruses (several hundred bytes). In the '80s, I reverse-engineered an 8KB computer chess program.
When reverse-engineering an executable, you'll need a good disassembler or decompiler. Depending on what you use, you may get assembly-language or C code as the output. But note that this will not be commented code, and you will not see meaningful variable or function names. You'll have to read every line of code to determine what it does. And you'll quickly discover that the executable is the least of your worries. Excel's executable links in a number of DLL files, which would also need reverse-engineering.
To be successful, you will need an extensive knowledge of Windows programming in addition to C or Intel assembly code – not to mention a large amount of patience. Learning Access would be a much simpler task.
I'd be interested in why 32767 is insufficient for your needs. A database may make more sense, and it wouldn't necessarily need to duplicate the functionality of Excel. I store information in a database for output to Web pages, in which case I use HTML+JavaScript for anything that needs to be interactive.
In case anyone is still having this issue:
I had the same problem with generating a pipe-separated file of longitudinal research data. The header row exceeded the 32767 limit. Not an issue unless the end-user opens the file in excel. Work around is to have end-user open file in google sheets, perform the text-to-columns transformation, then download and open file in excel.
https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Length-limit-of-cell-contents-in-Excel-when-opening-exported-bibliographic-data?language=en_US
Jack Straw from Wichita (https://stackoverflow.com/users/10327211/jack-straw-from-wichita) surely you can do an import of a pipe separated file directly into Excel, using Data>Get Data? For me it finds the pipe and treats the piped file in the same way as a CSV. Even if for you it did not, you have an option on the import to specify the separator that you are using in your text file.
Kind regards
Sefton Hall

Programming graphics and sound on PC - Total newbie questions, and lots of them!

This isn't exactly specifically a programming question (or is it?) but I was wondering:
How are graphics and sound processed from code and output by the PC?
My guess for graphics:
There is some reserved memory space somewhere that holds exactly enough room for a frame of graphics output for your monitor.
IE: 800 x 600, 24 bit color mode == 800x600x3 = ~1.4MB memory space
Between each refresh, the program writes video data to this space. This action is completed before the monitor refresh.
Assume a simple 2D game: the graphics data is stored in machine code as many bytes representing color values. Depending on what the program(s) being run instruct the PC, the processor reads the appropriate data and writes it to the memory space.
When it is time for the monitor to refresh, it reads from each memory space byte-for-byte and activates hardware depending on those values for each color element of each pixel.
All of this of course happens crazy-fast, and repeats x times a second, x being the monitor's refresh rate. I've simplified my own likely-incorrect explanation by avoiding talk of double buffering, etc
Here are my questions:
a) How close is the above guess (the three steps)?
b) How could one incorporate graphics in pure C++ code? I assume the practical thing that everyone does is use a graphics library (SDL, OpenGL, etc), but, for example, how do these libraries accomplish what they do? Would manual inclusion of graphics in pure C++ code (say, a 2D spite) involve creating a two-dimensional array of bit values (or three dimensional to include multiple RGB values per pixel)? Is this how it would be done waaay back in the day?
c) Also, continuing from above, do libraries such as SDL etc that use bitmaps actual just build the bitmap/etc files into machine code of the executable and use them as though they were build in the same matter mentioned in question b above?
d) In my hypothetical step 3 above, is there any registers involved? Like, could you write some byte value to some register to output a single color of one byte on the screen? Or is it purely dedicated memory space (=RAM) + hardware interaction?
e) Finally, how is all of this done for sound? (I have no idea :) )
a.
A long time ago, that was the case, but it hasn't been for quite a while. Most hardware will still support that type of configuration, but mostly as a fall-back -- it's not how they're really designed to work. Now most have a block of memory on the graphics card that's also mapped to be addressable by the CPU over the PCI/AGP/PCI-E bus. The size of that block is more or less independent of what's displayed on the screen though.
Again, at one time that's how it mostly worked, but it's mostly not the case anymore.
Mostly right.
b. OpenGL normally comes in a few parts -- a core library that's part of the OS, and a driver that's supplied by the graphics chipset (or possibly card) vendor. The exact distribution of labor between the CPU and GPU varies somewhat though (between vendors, over time within products from a single vendor, etc.) SDL is built around the general idea of a simple frame-buffer like you've described.
c. You usually build bitmaps, textures, etc., into separate files in formats specifically for the purpose.
d. There are quite a few registers involved, though the main graphics chipset vendors (ATI/AMD and nVidia) tend to keep their register-level documentation more or less secret (though this could have changed -- there's constant pressure from open source developers for documentation, not just closed-source drivers). Most hardware has capabilities like dedicated line drawing, where you can put (for example) line parameters into specified registers, and it'll draw the line you've specified. Exact details vary widely though...
e. Sorry, but this is getting long already, and sound covers a pretty large area...
For graphics, Jerry Coffin's got a pretty good answer.
Sound is actually handled similarly to your (the OP's) description of how graphics is handled. At a very basic level, you have a "buffer" (some memory, somewhere).
Your software writes the sound you want to play into that buffer. It is basically an encoding of the position of the speaker cone at a given instant in time.
For "CD quality" audio, you have 44100 values per second (a "sample rate" of 44.1 kHz).
A little bit behind the write position, you have the audio subsystem reading from a read position in the buffer.
This read position will be a little bit behind the write position. The distance behind is known as the latency. A larger distance gives more of a delay, but also helps to avoid the case where the read position catches up to the write position, leaving the sound device with nothing to actually play!

Framebuffer Documentation

Is there any documentation on how to write software that uses the framebuffer device in Linux? I've seen a couple simple examples that basically say: "open it, mmap it, write pixels to mapped area." But no comprehensive documentation on how to use the different IOCTLS for it anything. I've seen references to "panning" and other capabilities but "googling it" gives way too many hits of useless information.
Edit:
Is the only documentation from a programming standpoint, not a "User's howto configure your system to use the fb," documentation the code?
You could have a look at fbi's source code, an image viewer which uses the linux framebuffer. You can get it here : http://linux.bytesex.org/fbida/
-- It appears there might not be too many options possible to programming with the fb from user space on a desktop beyond what you mentioned. This might be one reason why some of the docs are so old. Look at this howto for device driver writers and which is referenced from some official linux docs: www.linux-fbdev.org [slash] HOWTO [slash] index.html . It does not reference too many interfaces.. although looking at the linux source tree does offer larger code examples.
-- opentom.org [slash] Hardware_Framebuffer is not for a desktop environment. It reinforces the main methodology, but it does seem to avoid explaining all the ingredients necessary to doing the "fast" double buffer switching it mentions. Another one for a different device and which leaves some key buffering details out is wiki.gp2x.org [slash] wiki [slash] Writing_to_the_framebuffer_device , although it does at least suggest you might be able use fb1 and fb0 to engage double buffering (on this device.. though for desktop, fb1 may not be possible or it may access different hardware), that using volatile keyword might be appropriate, and that we should pay attention to the vsync.
-- asm.sourceforge.net [slash] articles [slash] fb.html assembly language routines that also appear (?) to just do the basics of querying, opening, setting a few basics, mmap, drawing pixel values to storage, and copying over to the fb memory (making sure to use a short stosb loop, I suppose, rather than some longer approach).
-- Beware of 16 bpp comments when googling Linux frame buffer: I used fbgrab and fb2png during an X session to no avail. These each rendered an image that suggested a snapshot of my desktop screen as if the picture of the desktop had been taken using a very bad camera, underwater, and then overexposed in a dark room. The image was completely broken in color, size, and missing much detail (dotted all over with pixel colors that didn't belong). It seems that /proc /sys on the computer I used (new kernel with at most minor modifications.. from a PCLOS derivative) claim that fb0 uses 16 bpp, and most things I googled stated something along those lines, but experiments lead me to a very different conclusion. Besides the results of these two failures from standard frame buffer grab utilities (for the versions held by this distro) that may have assumed 16 bits, I had a different successful test result treating frame buffer pixel data as 32 bits. I created a file from data pulled in via cat /dev/fb0. The file's size ended up being 1920000. I then wrote a small C program to try and manipulate that data (under the assumption it was pixel data in some encoding or other). I nailed it eventually, and the pixel format matched exactly what I had gotten from X when queried (TrueColor RGB 8 bits, no alpha but padded to 32 bits). Notice another clue: my screen resolution of 800x600 times 4 bytes gives 1920000 exactly. The 16 bit approaches I tried initially all produced a similar broken image to fbgrap, so it's not like if I may not have been looking at the right data. [Let me know if you want the code I used to test the data. Basically I just read in the entire fb0 dump and then spit it back out to file, after adding a header "P6\n800 600\n255\n" that creates the suitable ppm file, and while looping over all the pixels manipulating their order or expanding them,.. with the end successful result for me being to drop every 4th byte and switch the first and third in every 4 byte unit. In short, I turned the apparent BGRA fb0 dump into a ppm RGB file. ppm can be viewed with many pic viewers on Linux.]
-- You may want to reconsider the reasons for wanting to program using fb0 (this might also account for why few examples exist). You may not achieve any worthwhile performance gains over X (this was my, if limited, experience) while giving up benefits of using X. This reason might also account for why few code examples exist.
-- Note that DirectFB is not fb. DirectFB has of late gotten more love than the older fb, as it is more focused on the sexier 3d hw accel. If you want to render to a desktop screen as fast as possible without leveraging 3d hardware accel (or even 2d hw accel), then fb might be fine but won't give you anything much that X doesn't give you. X apparently uses fb, and the overhead is likely negligible compared to other costs your program will likely have (don't call X in any tight loop, but instead at the end once you have set up all the pixels for the frame). On the other hand, it can be neat to play around with fb as covered in this comment: Paint Pixels to Screen via Linux FrameBuffer
Check for MPlayer sources.
Under the /libvo directory there are a lot of Video Output plugins used by Mplayer to display multimedia. There you can find the fbdev (vo_fbdev* sources) plugin which uses the Linux frame buffer.
There are a lot of ioctl calls, with the following codes:
FBIOGET_VSCREENINFO
FBIOPUT_VSCREENINFO
FBIOGET_FSCREENINFO
FBIOGETCMAP
FBIOPUTCMAP
FBIOPAN_DISPLAY
It's not like a good documentation, but this is surely a good application implementation.
Look at source code of any of: fbxat,fbida, fbterm, fbtv, directFB library, libxineliboutput-fbe, ppmtofb, xserver-fbdev all are debian packages apps. Just apt-get source from debian libraries. there are many others...
hint: search for framebuffer in package description using your favorite package manager.
ok, even if reading the code is sometimes called "Guru documentation" it can be a bit too much to actually do it.
The source to any splash screen (i.e. during booting) should give you a good start.

How does a 7- or 35-pass erase work? Why would one use these methods?

How and why do 7- and 35-pass erases work?
Shouldn't a simple rewrite with all zeroes be enough?
A single pass with zeros doesn't completely erase magnetic artifacts from a disk. It's still possible to recover the data from the drive. A 7-pass erasure using random data will do a pretty complete job to prevent reconstruction of the data on the drive.
Wikipedia has a number of different articles relating to this topic.
http://en.wikipedia.org/wiki/Data_remanence
http://en.wikipedia.org/wiki/Computer_forensics
http://en.wikipedia.org/wiki/Data_erasure
I'd never heard of the 35-part erase: http://en.wikipedia.org/wiki/Gutmann_method
The Gutmann method is an algorithm for
securely erasing the contents of
computer hard drives, such as files.
Devised by Peter Gutmann and Colin
Plumb, it does so by writing a series
of 35 patterns over the region to be
erased. The selection of patterns
assumes that the user doesn't know the
encoding mechanism used by the drive,
and so includes patterns designed
specifically for three different types
of drives. A user who knows which type
of encoding the drive uses can choose
only those patterns intended for their
drive. A drive with a different
encoding mechanism would need
different patterns. Most of the
patterns in the Gutmann method were
designed for older MFM/RLL encoded
disks. Relatively modern drives no
longer use the older encoding
techniques, making many of the
patterns specified by Gutmann
superfluous.[1]
Also interesting:
One standard way to recover data that
has been overwritten on a hard drive
is to capture the analog signal which
is read by the drive head prior to
being decoded. This analog signal will
be close to an ideal digital signal,
but the differences are what is
important. By calculating the ideal
digital signal and then subtracting it
from the actual analog signal it is
possible to ignore that last
information written, amplify the
remaining signal and see what was
written before.
As mentioned before, magnetic artifacts are present from the previous data on the platter.
In a recent issue of MaximumPC they put this to the test. They took a drive, ran it through a pass of all zeros, and hired a data recovery firm to try and recover what they could. Answer: Not one bit was recovered. Their analysis was that unless you expect the NSA to try, a zero pass is probably enough.
Personally, I'd run an alternating pattern or two across it.
one random pass is enough for plausible deniability, as the lost data will have to be mostly "reconstructed" with a margin of error that grows with the length of the data trying to be recovered, as well as whether or not the data is contiguous (most cases, its not).
for the insanely paranoid, three passes is good. 0xAA (10101010), 0x55 (01010101), and then random. the first two will grey out residual bits, the last random pass will obliterate any "residual residual" bits.
never do passes with zeros. under magnetic microscopy the data is still there, its just "faded".
never trust "single file shredding", especially on solid state mediums like flash drives. if you need to "shred" a file, well, "delete" it and fill your drive with random data files until it runs out of space. then next time think twice about housing shred-worthy data on the same medium as "low-clearance" stuff.
the gutmann method is based on tin-foil hat speculation, it does various things to get drives to degauss themselves, which is admirable in an artistic sense, but pragmatically its overkill. no private organisation to-date has successfully recovered data from even a single random pass. and as for big brother, if the DoD considers it gone then you know its gone, the military industrial complex gets all the big bucks to try and do exactly what gutmann claims they can do, and believe you me if they had the tech to do so it would already have been leaked to the private sector since they're all in bed with each other. however if you want to use gutmann in spite of this, check out the secure-delete package for linux.
7 pass and 35 pass would take forever to finish. HIPAA only requires DOD 3-pass overwrite,
and I am not certain why DOD even has a 7 pass overwrite as it seems they just simply
shred the disks before disposing of machines anyway. Theoretically, you could recover
data off of the outer edges of each track (using a scanning electron microscope or
microscopic magnetic probe), but it practice you would need the resources of a disk
drive maker or one of the three letter government organizations to do this.
The reason to perform multipass writes is to take advantage of the slight errors in positioning to overwrite the edges of the track also, making recovery far less likely.
Most drive recovery companies can't recover a drive that has had its data overwritten
even once. They are typically taking advantage of the fact that Windows doesn't zero out the data blocks, just changes the directory to mark the space free. They simply 'undelete'
the file and make it visable again.
If you don't believe me, call them up and ask them if they can recover a disk
that has been dd'ed over... they will typically tell you no, and if they do agree to try, it will be serious $$$ to get it back...
DOD 3 pass followed by a zero overwrite should be more than sufficent for most (i.e.
non- TOP SECRET) folks.
DBAN (and its commercially supported decendent, EBAN) do this all cleanly... I would
recommed these.
See: Secure Deletion of Data from Magnetic and Solid-State Memory
Advanced recovery tools can recover single pass deleted files easily. And they are expensive too (e.g http://accessdata.com/).
A visual GUI for Gutmann passes from http://sourceforge.net/projects/gutmannmethod/ shows it has 8 semi random passes. I never seen a proof that files deleted by Gutmann been recovered.
An overkill, maybe, still far better that Windows soft delete.
Regarding the second part of the question, some of the answers here actually contradict real research on that exact atopic. According the the Number of overwrites needed of the Data erasure article on wikipedia, on modern drives, erasing with more than one pass is redundant:
"ATA disk drives manufactured after 2001 (over 15 GB) clearing by
overwriting the media once is adequate to protect the media from both
keyboard and laboratory attack." (citation)
Also, infosec did a nice article entitled "The Urban Legend of Multipass Hard Disk Overwrite", on the entire subject, talking about the old USA Government erasure standards, among others, of how the multi-pass myth established itself in the industry.
"Fortunately, several security researchers presented a paper [WRIG08]
at the Fourth International Conference on Information Systems Security
(ICISS 2008) that declares the “great wiping controversy” about how
many passes of overwriting with various data values to be settled:
their research demonstrates that a single overwrite using an arbitrary
data value will render the original data irretrievable even if MFM and
STM techniques are employed.
The researchers found that the probability of recovering a single bit
from a previously used HDD was only slightly better than a coin toss,
and that the probability of recovering more bits decreases
exponentially so that it quickly becomes close to zero.
Therefore, a single pass overwrite with any arbitrary value (randomly
chosen or not) is sufficient to render the original HDD data
effectively irretrievable."
There's a lot of misinformation around this, though most of the answers I see on this page are correct. I've worked in the data recovery industry for 25 years and have addressed this exact question an enormous number of times.
The "residual magnetism" hypothesis never worked in real life. And back then, tolerances were millions of times looser.
If you still doubt this, remember that a rotational hard drive uses the same storage principle as an audio tape - moving magnetic substrate storage - and the audio tape that was recorded over a single time in the Watergate case has still not been recovered.
A single zero-pass wipe renders all the data on a HDD unrecoverable unless some malfunction or mistake causes the overwrite to be incomplete. This was true even back in the days when Peter Gutmann released his paper (which was like a tsunami in the erasure industry.) Gutmann's paper was pure hypothesis, it never panned out in reality. Even in the days of MFM/RLL drives, nobody could recover from a single-pass overwrite. It should be noted that Gutmann patented the algorithm that his paper said would be required to ensure complete erasure. Presumably, every time erasure was sold with his algorithm, he got paid. I am not saying there was intentional deception on his part, just pointing out that his algorithm, though there was never any evidence it erased better than a single overwrite, was patented and sold.
Please note that SSDs are different. SSDs can (and often do) use a pool of sectors that are rotated in and out of use, so if data is written to an SSD and then "deleted" and the drive rotates the sectors on which the deleted file is on out of the pool, an erasure might not be able to reach those sectors because the firmware in the SSD has control that software can't override. One way around this is to continuously overwrite until all sectors have been rotated into use.
The reason multiple passes exist is because hardware can malfunction. If the drive somehow malfunctions during one pass, it's possible that not all sectors will be erased - however, most good erasure software offers a full verification, which basically reads every bit on the drive to make sure the erasure didn't malfunction. With that, multi-pass overwrites are overkill.
And sometimes, data is so sensitive, it makes sense to go overboard in making sure it's destroyed. For example, I heard about a drive that was erased by the military with a 7-pass zero-fill, then the drive was run over by a tank, and then the remains were buried in a secret location in a highly secured area. Practically, the recoverability is about the same as a single-pass overwrite, but if lives could be lost as a result of the data falling into the wrong hands, then why not go for the overkill?

Do you use "kibibyte" as a unit of measurement in your programs? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
For decades, in the field of computing (except disk manufacturers), a KB (kilobyte) was understood to mean 1024 bytes. In the past few years, there has been a movement to use KiB ("kibibyte") to mean 1024 bytes, and change the meaning of kilobyte to be 1000 bytes, dooming us to many more years of confusion. On the other hand, the movement seems to be confined to Gnome, and some overzealous wikipedia editing.
Will you be converting your programs to use KiB? If you have ever displayed a filesize in KB, did you divide by 1000 or 1024?
KB is 1024 bytes, damnit.
I did this once before in an app. While internally it used kibbi's and mebbi's (KiB, MiB, etc), it would still display in what users (in this case IT folks) were used to. The underlying field was just a long that was in bytes IIRC.
It was forward compatible, and would at least allow you to enter 4 GB as well as 4GiB. It also understood shorthand entry like 4.5G and properly rounded back to the real number of bytes - not forcing poor user to have to enter it that way and prevent their mistakes. Updating to use IEC notation is 1 line of code.
kilo's are 1000 and 98% of the world uses metric. We need to get over it already.
I see a lot of anger in many of these responses which baffles me. SI prefixes are SI prefixes, and programmers have no right to alter them for no better reason than convenience and custom. It's odd that those in Computer Science, a highly technical field, are the one's clamoring to go back to the days of cubits furlongs and rods. wtf?
We all know what we mean, but sticking to custom alienates and confuses users. Just because in the early pioneer days some guys, when talking about computer memory, decided to reuse SI notation doesn't mean they were correct to do so.
In my opinion, 1 Kilobyte equals to 1000 bytes is something drivemakers want you to believe, so that your drive looks more spacious than it really is. ;)
Since I spent a few years learning to be a mechanical engineer before switching majors, I have to admit that "kilo" always means 10^3 to me. From that standpoint, KiB makes sense. However, try saying "kibibyte" outloud a few times, and think about how dumb you sound.
Therefore, kilogram is 1000 grams, kilobyte is 1024 bytes.
Addendum: In addition, I agree with those who have been saying that we shouldn't change what is already established if it works. 1024 is simply a nicer number in binary. Also, "kibibyte" still sounds like something a dog eats.
It's not changing the meaning of "kilobyte". Kilo means 1000. Some people were using it incorrectly to refer to units of 1024 bytes.
I never display file sizes in kibibytes, because users don't care about 1000 vs 1024. Instead, I always use "XXX KB/MB/GB", where XXX is the number of bytes divided by 1 thousand / 1 million / etc.
There are 2 ways to think about this:
Use what the operating system you're running on uses. That way users have a consistent experience.
Use what is correct.
If you use KiB always though there will be no confusion. If you use KB there will be confusion. So if you chose option #2 then you're better off actually using 1024 and using the KiB suffix. Working with powers of 2 is more efficient anyway.
It's up to you but my rule of thumb would be that if you have a technical audience, then use KiB and avoid all confusion. If you have a large user base of non technical users, then use what your operating system uses. By the way Windows uses KB to mean 1024 bytes.
Areas of speciality have always used terms in ways that are understood by that specialisation. For example, a mechanical engineer building a bridge uses the term "stress" to mean something completely different from, say, a lawyer who finds out his star witness has been lying on the first day in court. Should we mandate that the engineer use the same definition for "stress" as the lawyer just because that definition is more widely used? If we do, I'm not driving across that bridge!
Kilobytes = 1024 bytes. Its an industry accepted specialisation of the term.
I use KiB.
Do you really want to hurt everyone by refusing to use well-established standards just like IE?
I've always displayed file size in 1000-byte Kilobytes. It hardly ever matters to the people who can't tell the difference, and often relieves confusion when they see the actual number. 65323 bytes = 65Kb when rounded, and the "normal" people like that.
I probably won't ever display "KiB", since that's never what my customers want.
The arrogance of deciding not to follow the standard created by more than just the computer community (see... it isn't "new" that Kilo actually means 1000) is staggering.
Only if the situation called for it. In almost all cases, 1,000-based units are more appropriate.
The only exceptions I know of are memory, since it naturally comes in multiples of a power of two, and CD size, since it's measured in multiples of 220 bytes by the manufacturers. Everything else, including hard drives, DVDs, flash drives, bandwidths, processor speeds, memory buses, etc. is currently measured in 1000s, and file sizes should be, too. (Or, at least, me and Steve Jobs think so. Windows will probably continue measuring file sizes in 1024s for years...)
To avoid confusing the user, use k- = 1,000, and Ki- = 1,024.
The sloppy usage of "k" to mean 1024 is an unholy abomination that should be killed with fire.
Mac OS X doesn't use KiB, MiB, GiB. On the other hand, when it uses the metric ones, it at least does the maths correctly:
Personally I prefer to get this stuff right so that users who are currently in the dark would learn from it. Waiting for users to change first is just foolish. Users didn't suddenly wake up some day and think that a kilobyte is 1024 bytes - it was software which made them think that, so shouldn't it be software's job to correct the mistake?
I've worked in the storage industry for a decade. Arguments over the size of a TB can vary the size of a solution by 10%. In short: programmers and the storage industry use different measurements. Neither are right all the time.
The Storage Networking Industry Association (SNIA) dictionary defines kilobyte as:
Kilobyte (KB)
[General] 1,000 (10^3) bytes.
The SNIA uses the 10^3 convention commonly found in storage and data transfer-related literature rather than the 1,024 (2^10) convention common in computer system random access memory and software literature.
My rule of thumb is:
Measure memory, files, file systems, and data on a network in 1024^n byte blocks.
Measure raw disk space — and only raw disk space — in 1000^n byte blocks.
Tell the customer which unit you're using. Repeat yourself often.
By and large, that keeps me out of trouble.
One program I'm working on uses "KiB" by default, but has a user pereference as to which unit of measurement to use (1024 B in a KiB, 1024 B in a KB, or 1000 B in a KB).
No. 1024 bytes is a kilobyte, regardless of whether that makes sense.
The usage of the "kilo-" prefix for units of 1024 bytes back in the day was probably a mistake. But it's now the standard. Trying to change it now only adds to the confusion.
We don't deal with the world as it should be; we deal with the world as it is.
Technically KiB is correct, but I have seen it only in a few applications (mainly linux console apps). Users are either used to work with 1024 for both KB and KiB (IT people) or they don't really care and will think that "KiB" is misspelled (non-IT people).
However: I have been used to work with "Kilobytes = 1024 bytes" for over 20 years now and even though I know that it is scientifically wrong will go on using it.
If you need to provide KiB to allow your soul to rest, make it available as an option, but don't confuse poor users with yet another definition - especially if they work with an OS, that uses the non-scientific approach and defines KB as 1024.
(BTW: Kibibytes always reminds me of Tinky Winky and his friends... ;) )
I tried to start using these terms when teaching my students, but I've sort of given up now.
I've taught an introductory computer course ("and this is a disk drive") a few times, and it can be confusing for the students that the prefixes mean different things in different contexts. Kilo means 1024 when you have a kilobyte or a kilobit of data, except if you store it on disk when it is 1000, and if you send a kilobit per second over a network then it is 1000, and a kilohertz is of course 1000 too. And one kilometer of fiber cable is 1000 meters! But it turns out that it really isn't that much of a problem. The engineering and computer science students need to know the difference, and they will get used to it anyway. When I meet them again in database courses or in the compiler course, there is never any confusion about the different kinds of kilos, megas and teras. And students from other areas (media design and so on) don't really care.
And after I did an informal poll among the other computer science people in my corridor at the university, and found out that most of them had never heard of these new prefixes, I definitely gave up.
A KB is 1024 bytes
A kB is 1000 bytes
unfortunately spelled out is ambiguous. I always use 1024.
Knuth refers to MB as KKBytes or kkBytes to differentiate between 1024*1024 and 1000*1000
I have honestly never heard of this & I doubt it's going to gain much traction in mainstream usage. I can't imagine why I would want to start doing this. The current definition of kilobyte is accurate & sufficient. I would much rather see hard drive manufacturers start using accurate terminology rather than further dumb-down technical terminology. Why can't manufacturers either build drives that are exactly xGB in size or simply say what they really are?
Other than rants about how the terminology needs to change, I have never heard those expressions used. It is not going to catch on.
I'm still going by measurements of 210*n until computers are based on decimal...
Kilo means 10^3 when you're working in the decimal number system.
Kilo means 2^10 when you're working in the binary number system.
I mean, just look at it... they're both quite arbitrary. It seems to me that anything else is equally arbitrary - so we have 40-year entrenched arbitrary versus brand-new arbitrary. Which should win? For now, I vote for the entrenched method, simply because it will cause less total confusion.
At some point our technology is bound to change - think quantum/genetic computers - that point will be a good opportunity to sanitize our measuring system.
Also, some users will always be confused - should we remove confusion for them at the risk of confusing the community that makes it all happen (us and the hardware guys)? I think not.
For me, this is a bit like the 'hacker' arguments we had, back in the day.
Depending on how old and stubborn you are, 'hacker' may mean a different thing to you. For a while in the media (and probably still today, partly) people consider hacking to be the act of breaking into machines illegally. However, in the industry now, the feeling people get is that it is someone who enjoys tinkering with things.
For a while the security community wasn't sure if this would take off, and we actually tried to use 'cracker' to refer to the bad guys. I don't think cracker has really taken off like we'd like, but we have reclaimed 'hacker' as a legitimate term, to quite a reasonable degree of success.
So to me this is the same: just because the media has tried to consider a KB as 1,000, I will never back down, and always stand up for the rights of the remaining 24 bits.
24bFL
Drivemaker/denary Kilobytes can burn in hell. Binary units for binary machines.

Resources