Standard method for creating file read/write 'library'? - excel

I am attempting to create a file read/write 'library' using Excel VBA for an old file format. Just for info, the format is LIS79, an old oil industry format for writing well-site data to tape - largely streams of wellbore measurements like density, resistance, temperature etc. The full spec is here (PDF doc) - it's pretty long and boring.
As I'm using Excel VBA I guess it's not really a library, but a collection of subs and functions etc.
I have been tapping away for a month or so and am making good progress, though it's becoming increasingly complicated - the number of subs and functions I need to write keeps on growing.
I figure I'm probably not the fist person to try and write a file read/write library from it's specification. So I've been searching StackExchange, searched using Google and browsed programming books at the local university library, to see if any sort of standard process or method might exist that would make the whole task a bit simpler. Alas I haven't been able to find anything, though I have no formal programming background so it's possible I don't know precisely what to search for.
Does anyone know of a standard method, procedure or guidelines for creating file read/write libraries that you could refer me to?
Or is it just a matter of persevering - file read/write libraries can be long and complicated things to create?
Many thanks.

Related

How do I store versions of consistently changing text with low cost?

I have been creating a text editor online, just for learning experience. I was curious what the best way to store multiple versions of a text file that is consistently changing is.
I've looked at a variety of options and I am yet to see a cheap, and scale-able option.
I've looked into Google Cloud Storage and Amazon S3. The only issue is that too many requests to save the file start to add up a lot in cost. I'd like files to be saved practically instantly, and also versioned every so often. I've also looked into data deduplication which looks like a great option, but I have not yet found a way to do it without writing my own software.
Any and all advice would be greatly appreciated. Thanks!
This is a very broad question, but the basic answer is usually some flavor of Operational Transform. Basically you don't want to be constantly sending the entire document back and forth between the user(s) and the server, nor do you want to overwrite the whole of the document repeatedly. Instead you want to store diffs. Then you need to deal with the idea that multiple users might be changing the file simultaneously, but possibly in different areas, and dealing with that effectively.
Wikipedia has some good, formal discussion of the idea: https://en.wikipedia.org/wiki/Operational_transformation
You wouldn't need all of that for a document that will only be edited by one person at a time, but even then, the answer is to think in terms of diffs from previous versions and only occasionally persist whole snapshots.

A new idea on how to beat the 32,767 text limit in Excel

So as many others have asked in the past is there a way to beat the 32k limit per cell in Excel?
I have found ways to do it by splitting the work load into two different .txt files and then merging the two .txt files, however it is a giant PITA and more often then not I end up only using excel to its limits as I do not have time to validate the data after .txt file merges anymore this is a long process and tedious IMO.
However I think that if the limitation is there it is there because it was coded when Microsoft developed Excel, and since they have yet to raise it (2013 version the limit is still the same limit so it would do no good to upgrade)
I also know that many will say if you have a need for information in a single cell in that length then you should use ACCESS well I have no idea how to use ACCESS or how to import a tab delimited file into ACCESS like you would into EXCEL, and then even if I could figure that out I still now have to figure out how to learn all the new commands and he EXCEL equivalents if there is even such a thing.
So I was browsing some blog posts the other day on how to beat limitations by software and I read something about reverse engineering.
Would it be possible to load excel into a hex editor, go in and change every instance of 32767 to something greater?
While 32767 may seem like an arbitrary number, it's actually the upper limit of a 16-bit signed integer (called a short in C). The range of a short goes from -32768 to 32767.
A 16-bit integer can also be unsigned, in which case its range is 0 to 65535.
Since it's impossible for a cell to have a negative number of characters, it seems odd that Microsoft would limit a cell's length based on a signed rather than unsigned 16-bit integer. When they wrote the original program, they probably couldn't imagine anyone storing so much information in a single cell. Using shorts may have simplified the code. (My first computer had only 4K of memory, so it's still amazing to me that Excel can store 8 times that much information in a single cell.)
Microsoft may have kept the 32767 limit to maintain backward compatibility with previous versions of Excel. However, that doesn't really make sense, because the row and column counts greatly increased in recent versions of Excel, making large spreadsheets incompatible with previous versions.
Now to your question of reverse-engineering Excel. It would be a gargantuan task, but not impossible. In the early '90s, I reverse-engineered and wrote vaccines for a few small computer viruses (several hundred bytes). In the '80s, I reverse-engineered an 8KB computer chess program.
When reverse-engineering an executable, you'll need a good disassembler or decompiler. Depending on what you use, you may get assembly-language or C code as the output. But note that this will not be commented code, and you will not see meaningful variable or function names. You'll have to read every line of code to determine what it does. And you'll quickly discover that the executable is the least of your worries. Excel's executable links in a number of DLL files, which would also need reverse-engineering.
To be successful, you will need an extensive knowledge of Windows programming in addition to C or Intel assembly code – not to mention a large amount of patience. Learning Access would be a much simpler task.
I'd be interested in why 32767 is insufficient for your needs. A database may make more sense, and it wouldn't necessarily need to duplicate the functionality of Excel. I store information in a database for output to Web pages, in which case I use HTML+JavaScript for anything that needs to be interactive.
In case anyone is still having this issue:
I had the same problem with generating a pipe-separated file of longitudinal research data. The header row exceeded the 32767 limit. Not an issue unless the end-user opens the file in excel. Work around is to have end-user open file in google sheets, perform the text-to-columns transformation, then download and open file in excel.
https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Length-limit-of-cell-contents-in-Excel-when-opening-exported-bibliographic-data?language=en_US
Jack Straw from Wichita (https://stackoverflow.com/users/10327211/jack-straw-from-wichita) surely you can do an import of a pipe separated file directly into Excel, using Data>Get Data? For me it finds the pipe and treats the piped file in the same way as a CSV. Even if for you it did not, you have an option on the import to specify the separator that you are using in your text file.
Kind regards
Sefton Hall

How to read excel(2007+ xlsx) sheet using actionscript(AIR)?

How to read excel(2007+ xlsx) sheet using actionscript(AIR)?
as3xls
An Actionscript 3 library for reading and writing Excel files. Currently reading numbers, text, and formulas from Excel version 2.0-2003 and writing numbers, text, and dates to Excel 2.0 is supported. No server-side help is needed.
SUPPORT INFORMATION
Documentation and samples are at http://code.google.com/p/as3xls/
I wrote this: https://github.com/childoftv/as3-xlsx-reader I'd love to know if it helps
Do you have any idea how... Inefficient this is?
Excel uses a complex setup for files, and unless you want to write a full-scale parser for its spreadsheets (which, believe me, will be difficult, alone to figure out what the format chars do), you'd be better off finding another solution.
Say, using a "save to XML" option would make your job a few thousand times easier, without exaggeration. AS3 has no native support for Excel, there is no real point for it to have such. But it has great integrated methods for working with XML.
If possible, save the Excel files to XML and parse those.
Better still, use databases, and parse them as XML through PHP.
I did a search and came up with this: http://code.google.com/p/php-excel-reader/
Once you've got it in PHP, passing it on to Flash is no problem at all. I'd recommend turning it into straight arrays of objects and converting it to AMF3 via Zend_Amf, AMFPHP or WebOrb, whichever one you're most comfortable with. You can then create tables, manipulate the data or whatever you like. It'd also be a lot faster and lighter than using XML.
PK
I took a look at the xlsx breakdown and it would take me 1 week to write an xlsx writer that could do basic formatting and formulas. I've only spent 1 hour perusing through the directories in an xlsx file and all you'd have to do is create the same directory structure...mostly cut and paste some strings..and then zip it and call it xlsx.
I tried this theory by manually making an xlsx file using 7zip. I downloaded childoftv's reader and, though I don't need the reader, the package includes a few zip/unzip classes that would prove helpful for anyone who wants to make a xlsx writer.
Long story short, the setup isn't complex, somebody just has to take a week out of their busy schedule to do it. I need this functionality so if nobody's done it yet, then I'll have to. Hopefully my search will find something better than a forum where the general consensus is "it's too hard, give up."

online trading bot

I want to code a trading bot for Magic: The Gathering Online. This bot should wait until someone offers to trade, accept, look through the cards available from the other trader (the information is shown on screen), and perform other similar functions. I have several questions:
How can it know that someone is offering a trade?
How can it know that the other trader has some card (the informaion is stored in pictures)?
I just cannot imagine right now how to do it, I have no experience with it, until now I've been coding only console programs for my physics neсessities.
First, you should note that some online games forbid bots, as they can give certain players unfair advantages. The MTGO Terms of Service do not seem to say anything about this, though they do put restrictions on anything that might negatively impact the service. They have also said that there is a possibility they will add an API in the future, so they don't seem to be against the idea of automation, but are not supporting it at the moment. Tread carefully here, but it looks like it should be OK to write a bot as long as it is not harmful or abusive. This is not legal advice, and it would be a good idea to ask the folks who run MTGO for permission. edit since I wrote this, it has been pointed out that there are lots of bots already, so there should be no problems writing bots.
Assuming that it is not forbidden by the terms of service, but they do not have an API, you will have to find a way to detect what's going on, and control the game automatically. There's a pretty good series of articles on writing poker bots (archived copy), which has some good information on how to inject a DLL into an application, scrape the screen, and control the application. That might provide you with a starting point for doing this sort of thing.
You might also want to look for tools that other people have already written for doing this. It looks like there are several existing MTGO bots, but they all seem a bit sketchy (there have been some reports of them stealing passwords), so be careful there.
Edit
Since this answer still seems to be getting upvotes, I should probably update it with some more useful information. Since writing this, I have found a great UI automation system called Sikuli. It allows you to write programs in Python that automate a GUI. It includes image recognition features which make it very easy to recognize buttons, cards, and other UI elements; you just take a screenshot, crop it down to include just the thing you're interested in, and do fuzzy image matching (so that changing backgrounds and the like doesn't cause the match to fail). It even includes a custom IDE that allows you to embed those screenshots directly in your source code, so you can see exactly what the code is looking for. Here's an example from the documentation (apologies for the code formatting, doing images inline in code is not easy given StackOverflow's restricted subset of HTML):
def resizeApp(app, dx, dy):
switchApp(app)
corner = find(Pattern().targetOffset(3,14))
drop_point = corner.getTarget().offset(dx, dy)
dragDrop(corner, drop_point)
resizeApp("Safari", 50, 50)
This is much easier to get started with than the techniques mentioned in the article linked above, of injecting a DLL into the process you are debugging. Sikuli runs entirely at the UI level, so you never have to modify the program you are automating or worry about changes to the internals breaking your script.
One thing it is a bit poor at is handling text; it has OCR features, but they aren't all that good. If the text is selectable, however, you can select the text, copy it, and then look directly at the clipboard.
If I were to write a bot to automate something without a good API or text-based interface, Sikuli is probably the first tool I would reach for.
This answer is constructed from my comments.
What you are trying to do is hard, any way you try and do it.
Arguably the easiest way to do it is to totally mimic the user. So the application presses buttons, moves the mouse etc. The downside with this is that it is dependant on being able to recognise the screen.
This is easier if you can alter the games files as you can then just skin ( changing the image (texture)) the required cards to a single unique colour.
The major down side is you have to have the game as the top level window or have the game running in a virtual machine. Neither of which is ideal.
Another method is to read the processes memory. You may be able to find a list of memory locations, which would make things simpler, otherwise it involves a lot of hardwork, a debugger to deduce the memory addresses. It also helps (a lot) to be able to understand assembly.
The third method is to intercept the packets, and alter them. This is easier that the method above as it (at least for me) is easier to reverse engine the protocol as you have less information to deal with. It is just a matter of setting up a packet sniffer and preforming a action with one variable different (for example, the card) and comparing the differences.
The thing you need to check are that you are not breaking the EULA. I don't know how the game works, but most of the games I have come across have a EULA that prohibits (i.e. You get banned) doing any of the things I have mentioned.

Combining resources into a single binary file

How does one combine several resources for an application (images, sounds, scripts, xmls, etc.) into a single/multiple binary file so that they're protected from user's hands? What are the typical steps (organizing, loading, encryption, etc...)?
This is particularly common in game development, yet a lot of the game frameworks and engines out there don't provide an easy way to do this, nor describe a general approach. I've been meaning to learn how to do it, but I don't know where to begin. Could anyone point me in the right direction?
There are lots of ways to do this. m_pGladiator has some good ideas, especially with seralization. I would like to make a few other comments.
First, if you are going to pack a bunch of resources into a single file (I call these packfiles), then I think that you should work to avoid loading the whole file and then deseralizing out of that file into memory. The simple reason is that it's more memory. That's really not a problem on PC's I guess, but it's good practice, and it's essential when working on the console. While we don't (currently) serialize objects as m_pGladiator has suggested, we are moving towards that.
There are two types of packfiles that you might have. One would be a file where you want arbitrary access to the contents of the files. A second type might be a collection of files where you need all of those files when loading a level. A basic example might be:
An audio packfile might contain all the audio for your game. You might only need to load certain kinds of audio for the menus or interface screens and different sets of audio for the levels. This might fall intot he first category above.
A type that falls into the second category might be all models/textures/etc for a level. You basically want to load the entire contents of this file into the game at load time because you will (likely) need all of it's contents while a player is playing that level or section.
many of the packfiles that we build fall into the second category. We basically package up the level contents, and then compresses them with something like zlib. When we load one of these at game time, we read a small amount of the file, uncompress what we've read into a memory buffer, and then repeat until the full file has been read into memory. The buffer we read into is relatively small while final destination buffer is large enough to hold the largest set of uncompressed data that we need. This method is tricky, but again, it saves on RAM, it's an interesting exercise to get working, and you feel all nice and warm inside because you are being a good steward of system resources. once the packfile has been completely uncompressed into it's destinatino buffer, we run a final pass on the buffer to fix up pointer locations, etc. This method only works when you write out your packfile as structures that the game knows. In other words, our packfile writing tools share struct (or classses) with the game code. We are basically writing out and compressing exact representations of data structures.
If you simply want to cut down on the number of files that you are shipping and installing on a users machine, you can do with something like the first kind of packfile that I describe. Maybe you have 1000s of textures and would just simply like to cut down on the sheer number of files that you have to zip up and package. You can write a small utility that will basically read the files that you want to package together and then write a header containing the files and their offsets in the packfile, and then you can write the contents of the file, one at a time, one after the other, in your large binary file. At game time, you can simply load the header of this packfile and store the filenames and offsets in a hash. When you need to read a file, you can hash the filename and see if it exists in your packfile, and if so, you can read the contents directly from the packfile by seeking to the offset and then reading from that location in the packfile. Again, this method is basically a way to pack data together without regards for encryption, etc. It's simply an organizational method.
But again, I do want to stress that if you are going a route like I or m_pGladiator suggests, I would work hard to not have to pull the whole file into RAM and then deserialize to another location in RAM. That's a waste of resources (that you perhaps have plenty of). I would say that you can do this to get it working, and then once it's working, you can work on a method that only reads part of the file at a time and then decompresses to your destination buffer. You must use a comprsesion scheme that will work like this though. zlib and lzw both do (I believe). I'm not sure about an MD5 algorithm.
Hope that this helps.
do as Java: pack it all in a zip, and use an filesystem-like API to read directly from there.
Personally, I never used the already available tools to do that. If you want to prevent your game to be hacked easily, then you have to develop your own resource manipulation engine.
First of all read about serializing objects. When you load a resource from file (graphic, sound or whatever), it is stored in some object instance in the memory. A game usually uses dozens of graphical and sound objects. You have to make a tool, which loads them all and stores them in collections in the memory. Then serialize those collections into a binary file and you have every resource there.
Then you can use for example MD5 or any other encryption algorithm to encrypt this file.
Also, you can use zlib or other compression library to make this big binary file a bit smaller.
In the game, you should load the encrypted binary file and unpack it. Then decrypt it. Then deserialize the object collections and you have all resources back in memory.
Of course you can make this more comprehensive by storing in different binary files the resources for different levels and so on - there are plenty of variants, depending on what you want. Also you can first zip, then encrypt, or make other combinations of the steps.
Short answer: yes.
In Mac OS 6,7,8 there was a substantial API devoted to this exact task. Lookup the "Resource Manager" if you are interested. Edit: So does the ROOT physics analysis package.
Not that I know of a good tool right now. What platform(s) do you want it to work on?
Edited to add: All of the two-or-three tools of this sort that I am away of share a similar struture:
The file starts with a header and index
There are a series of blocks some of which may have there own headers and indicies, some of which are leaves
Each leaf is a simple serialization of the data to be stored.
The whole file (or sometimes individual blocks) may be compressed.
Not terribly hard to implement your own, but I'd look for a good existing one that meets your needs first.
For future people, like me, who are wondering about this same topic, check out the two following links:
http://www.sfml-dev.org/wiki/en/tutorials/formatdat
http://archive.gamedev.net/reference/programming/features/pak/

Resources