Is it possible to expand a gap buffer without copying data? - text

I was reading this overview of some possible data structures for storing a sequence of characters for the purposes of a text editor. One popular and efficient way is the gap buffer.
When the gap buffer fills up so that there is no longer a gap, the data would need to be copied into the beginning and end of a larger buffer to recreate a gap for further insertion. However, on page 9 of the overview, it states that
with some help from the operating system, we can expand the gap without actually moving any data.
I haven't been able to figure out a way to do that, so I'm wondering if is really possible. And if so, how it could be done and in which cases. Or am I misunderstanding what the author meant?

In Linux you can use e.g. mremap() to move data in the virtual address space.
As for Windows, you should use a combination of AllocateUserPhysicalPages(), MapUserPhysicalPages(), VirtualAlloc() and thereabouts.
The whole idea is that instead of copying the data you're changing the how/where physical memory (with the data) appears in the address space. If you aren't familiar with the related concepts, read up on page translation and page tables.
Update: Strictly speaking you may still end up copying up to a page size worth of data when the gap disappears and the cursor position is near the beginning of a memory page. But that should be hardly noticeable on modern systems. You still don't copy the data in all the other pages.

Related

How Might I organize vertex data in WebGL for a frame-by-frame (very specific) animated program?

I have been working on an animated graphics project with very specific requirements, and after quite a bit of searching and test coding, I have figured that I could take several approaches, but the Khronos and MDN documentation I have been reading coupled with other posts I have seen here don't answer all of my questions regarding my particular project. In the meantime, I have written short test programs (setting infrastructure for testing).
Firstly, I should describe the project:
The main object drawn to the screen is a simple quad surrounded by a black outline (LINE_LOOP or LINES will do, probably, though I have had issues with z-fighting...that will be left for another question). When the user interacts with the program, exactly one new quad is created and immediately drawn, but for a set amount of time its vertices move around until the quad moves to its final destination. (Note that translations won't do.) Random black lines are also drawn, and sometimes those lines also move around.
Once one of the quads reaches its final spot, it never moves again.
A new quad is always atop old quads (closer to the screen). That means that I need to layer the quads and lines from oldest to newest.
*this also means that it would probably be best to assign z-values to each quad and line, even if the graphics are in pixel coordinates and use an orthographic matrix. Would everyone agree with this?
Given these parameters, I have a few options with varying levels of complexity:
1> Take the object-oriented approach and just assign a buffer to each quad, and the same goes for the random lines. --creation and destruction of buffers every frame for the one shape that is moving. I truthfully think that this is a terrible idea that might only work in a higher level library that does heavy optimization underneath. This approach also doesn't take advantage of the fact that almost every quad will stay the same.
[vertices0] ... , [verticesN]
Draw x N (many draws for many small-size buffers)
2> Assign a z-value to each quad, outline, and line (as mentioned above). Allocate a huge vertex buffer and element buffer to store all permanently-in-their-final-positions quads. Resize only in the very unlikely case someone interacts for long enough. Create a second tiny buffer to store the one temporary moving quad and use bufferSubData every frame. When the quad reaches its destination, bufferSubData it into the large buffer and overwrite the small buffer upon creation of the next quad...all on the same frame. The main questions I have here are: is it possible (safe?) to use bufferSubData and draw it on the same frame? Also, would I use DYNAMIC_DRAW on both buffers even though the larger one would see fewer updates?
[permanent vertices ... | uninitialized (keep a count)]
bufferSubData -> [tempVerticesForOneQuad]
Draw 2x
3> Still create the large and small buffers, but instead of using bufferSubData every frame, create a second shader program and add an attribute for the new/moving quad that explicitly sets the vertex positions for the animation (I would pass vertex index attributes). Only draw with the small buffer when the quad is moving. For the frame when the quad reaches its destination, draw both large and small buffer, but then bufferSubData the final coordinates into the large permanent buffer to be used in the next frame.
switchToShaderProgramA();
[permanent vertices...| uninitialized (keep a count)]
switchToShaderProgramB();
[temp vertices] <- shader B accepts indices for each vertex so we can do all animation in the vertex shader
---last frame of movement arrives : bufferSubData into the permanent vertices buffer for when the the next quad is created
I get the sense that the third option might be the best, but I would like to learn whether there are some other factors that I did not consider. For example, my assumption that a program switch, additional attributes, and vertex shader manipulation would be faster than just substituting the buffer values as in 2>. The advantage of approach 3> (I think) is that I can defer the buffer substitution to a time when nothing needs to be drawn.
Still, I am still not sure of how to work with the randomly-appearing lines. I can't take the "single quad vertex buffer" approach since the number of lines cannot be predicted. Might I also allocate a large buffer for the moving lines? Those also stay after the quad is finished moving, though I don't think that I could use the vertex shader trick because there would be too many attributes to set (as opposed to the 4 for the one quad). I suppose that I could create a large "permanent line data" buffer first, but what to do during the animation is tricky because the lines move. Maybe bufferSubData() + draw on the same frame is not terrible? Or it could be. This is where I need advise.
I understand that this question might not be too specific code-wise, but I don't believe that I would be allowed to show the core of the program. All I have is the typical WebGL boilerplate ready.
I am looking forward to hearing people's thoughts on how I might proceed and whether there are any trade-offs I might have missed when considering the three options above.
Thank you in advance, and please feel free to ask any additional questions if clarification is necessary.
Honestly, for what you're describing, it doesn't sound to me like it matters which you choose. On modern hardware, drawing a few hundred quads and a few thousand lines each frame would not really tax the hardware much.
Having said that, I agree that approach 1 seems very inefficient. Approach 2 sounds perfectly fine. You can safely draw a buffer on the same frame that you uploaded the data. I don't think it matters much whether you use DYNAMIC_DRAW or STATIC_DRAW for the buffer. I tend to think of dynamic buffers as being something you're updating every frame. If you only update it every few seconds or less, then static is fine. Approach 3 is also fine. Between 2 and 3, I'd say do whichever is easier for you to understand and program.
Likewise, for the lines, I would use a separate buffer. It sounds like that one changes per frame, so I would use DYNAMIC_DRAW for that. Allocating a single large buffer for it and performing a glBufferSubData() per frame is probably a fine strategy. As always, trying it and profiling it will tell you for sure.

Repacking voxel data for efficient storage

I've got 3D voxel data, and I want to re-package it for memory efficiency and fast access. The data is generated in a regular octree, one integer value per cell. Unfortunately the data is not sparse, but the cells with the same value should be connected.
Example for one slice:
[11122]
[11223]
[12222]
[44444]
My current idea is to use a kD-Tree, preferably left-balanced, but I'm not sure if there is an efficient algorithm to generate this.
I've got some ideas, but I was hoping that this is one of those problems that already has established algorithms, or at least a name I could google for.
How about OctoMap? As I understand, it's like an Octree, but merges adjacent occupied areas into regions to save memory. But I don't know much about it.
EDIT
You could also try my PH-Tree. It works like a octree, but is quite memory efficient because every node only stores bits that are different from the parent node. You could actually store your integer value as a 4th dimension. Contrary to intuition, a 4D tree may require less space than a 3D tree and it may be faster (explanation is in the PDF that can be found in the link above). If your integer is the 4th dimension, than any sub-tree in the tree will only have entries with 'similar' integers, may be that is sufficient for your case? Also, any node contains only close neighbours, but close neighbours are not necessarily in the same (or adjacent) nodes.
One further link: http://www.openvdb.org/ . Why did I only find this after asking the question? It's like asking for something in the supermarket only to find out that you're standing next to it.
I ended up doing something simpler, because I needed a solution: I convert the voxel volume into a stack of 2D planes, and each plane stores at which point the value changes to the next higher plane. That way the voxel data is only compacted vertically, but it seems to be "good enough" for now. I'll crunch the numbers (space requirement vs. performance) for other data structures if I have some free time.

QR code compression

Is it possible to store about 20 000 characters in QR code? (Or even more? http://blog.qr4.nl/page/QR-Code-Data-Capacity.aspx)
I would like to store only ascii simbols (chars and numbers with extra dash and so on).
As far as I know it's possible to compress not complext text with ratio 80-98% which sound promissing: http://www.maximumcompression.com/index.html
Do you have some more experience? Thanks for sharing!
If your question is: "Is it possible to store 20K characters in QR Code?", then the answer is yes, it is possible.
If your question is: "Is it possible to guarantee you'll always be able to store 20K characters in QR Code to compression?", the answer is no. There is no way to guarantee that, due to pigeonhole principle.
If your question is: "Is there a "comfortable zone" where it is highly likely that a text input, whose maximum size is 20K, will most probably fit into a QR Code?", the proper answer is: it depends on your input data. And a more risky answer is: if you're dealing with "normal text" data, such as a book content, you're probably asking for too much.
The 80-90% compression ratio you refer to is possible because input data is extremely large (several MB), and decompression algorithms are very slow. For a "small" input data, such as 20K characters, the compression ratio for a "normal text" will more likely be in the 50-70% range, depending on algorithm strength (PPM for example, is very suitable for such input data).
Obviously, if your input data is a kind of "log file", with a huge lot of repetitions, then yes, compression ratio > 95% is easily accessible.
But compression ratio is not the only thing to take into consideration. For "real-life" usage, you'll also have to consider the QR size, and a reasonable level of correction for the QR print to survive. Betting on "max capacity with lowest possible correction" is a fairly wrong bet, at least for real life scenarios. You'll have to ask around you to know what are the "reasonable limits" of your QR Code. Most probably, printing capabilities will get into the way, and you'll have to settle for something less than maximum.
Last point, don't forget that compressed data are "binary", not "alphanumeric". As a consequence, the final capacity of your QR Code is into the last column. Which is much less than the column "alphanumeric".
QR codes have a special encoding mode for alphanumeric data (upper-case only, plus digits and a few symbols). It uses less than 8 bits per character and can store 4,296 characters at most in this mode.
This ought to be close to optimal. For simpler data (like, all alpha), a compression algorithm like gzip might be able to achieve fewer bits per byte. Of course, no standard reader would interpret the gzipped payload as such. Only a special reader would be able to.
Can you get 5x more data into a QR code this way? No, almost surely not, unless it's a trivial case like 20,000 "a"s.
Even if you could, it would create a large complex QR code. Anything holding over a few hundred bytes gets hard to scan in practice. Version 40, the largest, is useless in the real world. Even version 20 is.
In practice, when you want to use a QR to store huge ammounts of data, you simply store a URL pointing to the location of the data.
What is theoretically possible is very different to what is actually possible when you have to support real-life devices. Good luck scanning anything above version 10 (57x57 modules) with a low-end smartphone camera.

How does text processing works?

Consider the whole novel (e.g. The Da Vinci Code).
How does e-book reader software process and output the whole book??
Does it put the WHOLE book in one very large string?? array of strings?? Or what??
One of the very first "real" programs I wrote (as part of a class excersise in high school) was a text editor. Part of the requirement for this excersise was for the program to be able to handle documents of arbitrary length (ie larger than the available system memory).
We achieved this by opening the file, but reading only the portion of it required to display the current page of data. When the user moves forward or backward in the file, we read that portion of the file and display it.
We can speed the program up by reading ahead to load pages which we anticipate that the user will want, and by retaining recently read pages in memory so that there is no obvious delay when the user moves forward or backward.
So basically, the answer to your question is: "No. with very large text files, it is unusual to load the whole thing into memory at once. A program that can handle files like that will load it in chunks as it needs to, and drop chunks it doesn't need any more."
Complex document formats (such as ebooks) may have lookup tables built into the file to allow the user to search or jump quickly to a given page or chapter. In this, they effectively work like a database.
I hope that helps.

How do I save the character occupying a certain space in curses?

I'm beginning to try making some simple console games in C++ with curses, and my first project is just a large room to walk around in. I figure I'm gonna implement walking by having the program save the state of the square that the character is walking on, so when he walks onto the next square, it can restore whatever was there. Problem is, I don't know how to save the character at a certain position to a variable, and to my surprise I can't seem to find any comprehensive curses documentation. I'm looking for a function like this:
int storage = mvsavechar(1,1);
Does any such function exist?
You're looking for mvinch():
int storage = mvinch(1, 1) & A_CHARTEXT;
You're going to need to store the data for the room in some sort of data structure in your program. Curses is an output library.
I'd store screen state in a 80x24 (or whatever) char array. But probably you'd size the array to match the dimensions of your "large room". If the room were extremely large, you could store the equivalent of a sparse array by using (for example) a linked list of coordinates&contents.
Curses probably doesn't have the function you want because early terminals probably didn't have the capability of being interrogated about screen contents.
I don't know much about curses, what you want might be possible if curses maintained details of screen contents - but that seems very unlikely.

Resources