I need to prepare some input data to run through a program, the data should be in the following format.
UID (1-11)|TxtLen (12-16)| Text (17-62)
I can use sort to position the fields properly and get the UID and text fields.
The ‘TxtLen’ is should contain the number of chars from the start of the text field to the last char in the text field.
i.e. “Hello”’s TxtLen is 5, “Hel lo”’s TxtLen is 6, “Hello World”’s TxtLen is 11, etc...
I want to know if there is a way of getting the TxtLen through JCL only, or is a program required to do this?
-Thanks
You will need a program.
I see a fair number of mainframe questions on Stack Overflow asking if something is possible with "JCL only." Keep in mind that JCL is mostly a means of executing programs, and actually does very little other than that. For instance, when you say
I can use sort to position the fields properly and get the UID and
text fields
sort is a program. It happens to be a program found on most systems (though there are different vendors' implementations, IBM has one, SyncSort has one, CA has one, etc.) There are plenty of other programs commonly found on mainframe systems.
And just to be pedantic, JCL doesn't actually do anything, JES does the work as it interprets JCL.
For your particular situation you could create a SORT exit, or process your data in Rexx, or you could use some of the Unix System Services commands and execute those via BPXBATCH or COZBATCH.
I've done ad-hoc conversions like this using a REXX program. The program is pretty straight-forward:
allocate the input and output files
open both files
begin loop:
read the input
extract the text field and strip trailing spaces
get length of trimmed text field and format as 5-digit numeric
overlay number back into record in the Len field positions
write out updated record
repeat loop until end of file
close both files
free allocated files
Let me know if you need some actual code. I've found that REXX is superior than COBOL when it comes to string functions and manipulations. I've even created and called REXX routines from COBOL to accomplish just that.
Related
I am writing a function (in Python 3) which will allow someone to revise a set text.
Set texts (in this situation) are pieces of text (I am doing the Iliad, for example) which you need to learn for an exam. In this function I am focusing on a user trying to learn the translation of the text off by heart.
In order to do this, I want to write the translation in a text file, then the user can test themselves by writing it up, and the program will check whether each word is correct by checking it against the known, correct translation.
I know I could simply use input() for this, but it is inadequate as the user would have to type the entire text, or small parts, at a time for this to work, and I want to correct as they type in order for them to remember their mistakes more easily.
I have not written any code yet as how I write the rest of the program will depend on how I code this part.
Okay so I was thinking today about Minecraft a game which so many of you are so familiar with, I'm sure and while my question isn't directly related to the game I find it much simply to describe my question using the game as an example.
My question is, is there any way a type of "seed" or string of characters can be used to recreate an instance of a program (not in the literal programming sense) by storing a code which when re-entered into this program as a string at run-time, could recreate the data it once held again, in fields, text boxes, canvases, for example, exactly as it was.
As I understand it, Minecraft takes the string of ASCII characters you enter, all which truly are numbers, and performs a series of operations on it which evaluate to some type of hash or number which is finite... this number (again as I understand) is the representation of that string you entered. So it makes sense that because a string when parsed by this algorithm will always evaluate to the same hash. 1 + 1 will always = 2 so a seeds value must always equal that seeds value in the end. And in doing so you have the ability to replicate exactly, worlds, by entering this sort of key which is evaluated the same on every machine.
Now, if we can exactly replicate worlds like this this is it possible to bring it into a more abstract concept like the following?...
Say you have an application, like Microsoft Word. Word saved the data you have entered as a file on your hard drive it holds formatting data, the strings you've entered, the format of the file... all that on a physical file... Now imagine if when you entered your essay into Word instead of saving it and bringing your laptop to school you instead click on parse and instead of creating a file, you are given a hash code... Now you goto school you know you have to print it. so you log onto the computer and open Word... Now instead of open there is an option now called evaluate you click it and enter the hash your other computer formulated and it creates the exact essay you have written.
Is this possible, and if so are there obvious implementations of this i simply am not thinking of or are just so seemingly part of everyday I don't think recognize it? And finally... if possible, what methods and algorithms would go into such a thing?
[EDIT]
I had to do some research on the anatomy of a seed and I think this explains it well
The limit is 32 characters or for a
numeric seed, 19 digits plus the minus sign.
Numeric seeds can range from -9223372036854775808 to
9223372036854775807 which is a total of 18446744073709551616 Text
strings entered will be "hashed" to one of the numeric seeds in the
above range. The "Seed for the World Generator" window only allows 32
characters to be entered and will not show or use any more than that."
BUT looking back on it lossless compression IS EXACTLY what I was
describing after re-reading the wiki page and remembering that (you
are very correct) the seed only partakes in the generation, the final
data is stores as a "physical" file on the HDD (which again, you are correct) is raw uncompressed data in a file
So in retrospect, I believe I was describing lossless compression, trying in my mind to figure out how the seed was able to replicate the exact same world, forgetting the seed was only responsible for generating the code, not the saving or compression of it.
So thank you for your help guys! It's really appreciated I believe we can call this one solved!
There are several possibilities to achieve this "string" that recovers your data. However they're not all applicable depending on the context.
An actual seed, which initializes for example a peudo-random number generator, then allows to recreate the same sequence of pseudo-random numbers (see this question).
This is possibly similar to what Minecraft relies on, because the whole process of how to create a world based on some choices (possibly pseudo-random choices) is known in advance. Even if we pretend that we have random numbers, computers are actually deterministic, which makes this possible.
If your document were generated randomly then this would be applicable: with the same seed, the same gibberish comes out.
Some key-value dictionary, or hash map. Then the values have to be accessible by both sides and the string is the key that allows to retrieve the value.
Think for example of storing your word file on an online server, then your key is the URL linking to your file.
Compressing all the information that is in your data into the string. This is much harder, and there are strong limits due to the entropy of the data. See Shannon's source coding theorem for example.
You would be better off (as in, it would be easier) to just compress your file with a usual algorithm (zip or 7z or something else), rather than reimplementing it yourself, especially as soon as your document starts having fancy things (different styles, tables, pictures, unusual characters...)
With the simple hypothesis of 27 possible characters (26 letters and the space), Shannon himself shows in Prediction and Entropy of Printed English (Bell System Technical Journal, 30: 1. January 1951 pp 50-64, online version) that there is about 2.14 bits of entropy per letter in English. That's about 550 characters encoded with your 32 character string.
While this is significantly better than the 8 bits we use for each ASCII character, it also shows it is very likely to be impossible to encode a document in English in less than a fourth of its size. Then you'd still have to add punctuation, and all the rest of the fuss.
What I'm Doing
I am currently working on creating a SWI-Prolog module that adds tab-completion capability to the swipl-win window. So far I've actually gotten it to where it reads a single character at a time without stopping/returning anything until a tab character is typed. I have also already written a predicate that returns all possible completions of an incompletely typed term by using substring-matching on a list of current terms (obtained via current_functor/2, current_arithmetic_function/1, current_predicate/2, etc [the predicate used will eventually be based off of context]).
If you want to see my code, it is here...just keep in mind that I'm not exactly a Prolog master yet (friendly tips are more than welcome).
What I'm Thinking
I realize that when I actually implement my main completion predicate (still unwritten), I'll have to figure out what the last "word" is in the input stream. I'm debating on whether I should create a new stream with everything in the input stream so far (so I don't have to change the position in the input stream/go back to the beginning) or write to a string...if I take the second approach, I'll start over on the string whenever a delimiting character is inputted (characters that start a new "word", like space, comma, parentheses, operators, etc.) so there won't be any searching through the stream every time tab is pressed.
However, there is another thing: When the user is navigating through and modifying a typed but not-yet-submitted query (via arrow keys and backspace and such), a separate stream is necessary to handle mid-stream completion. A string will do just fine if completion is requested at the end of a stream (handling backspace is as easy as lopping off the last character of the string), but since the string would only contain the current "word", tabber.pl would be at a loss in instances like that. Unless, of course, the current-word string would update and find the current word that the cursor is in as the user navigated and typed mid-stream... (could I use at_end_of_stream(Stream) for that?)
What I'm Asking
How do you think I ought to approach this (string or stream)? The store-to-string method and the make-a-new-stream way both sound like they each have their advantages, so I'm pretty sure the solution will be some sort of combination of both. Any ideas, corrections, or suggestions on accomplishing my goal? (pun intended)
In order to figure that out and really do this correctly, I think I'll also have to know how SWI-Prolog use the input and output streams in the swipl-win window. (It's obviously accepting input, but does it use the output stream to write to the window as you type [into the input stream]?)
Getting this done without changing the C code underlying the swipl-win.exe console will be hard. This also relates to a thread on the mailing list starting here. The completion caller is in src/pl-ntmain.c, do_complete()
for Windows and src/os/pl-rl.c, prolog_completion() for the GNU readline based completion used on Unix systems.
The first step to make is lead these two and the upcoming one described in the referenced thread back
to Prolog using a callback. That requires a small study of the design of the completion interfaces to arrive at a suitable Prolog callback. I guess that should pass in some representation of the entire line and the caret location and return a list of completions from the caret. With that, anyone can write their own smart completer.
I am working on a website conversion. I have a dump of the database backend as an sql file. I also have a scrape of the website from wget.
What I'm wanting to do is map database tables and columns to directories, pages, and sections of pages in the scrape. I'd like to automate this.
Is there some tool or a script out there that could pull strings from one source and look for them in the other? Ideally, it would return a set of results that would say soemthing like
string "piece of website content here" on line 453 in table.sql matches string in website.com/subdirectory/certain_page.asp on line 56.
I don't want to do line comparisons because lines from the database dump (INSERT INTO table VALUES (...) ) aren't going to match lines in the page where it actually populates (<div id='left_column'><div id='left_content'>...</div></div>).
I realize this is a computationally intensive task, but even letting it run over the weekend is fine.
I've found similar questions, but I don't have enough CS background to know if they are identical to my problem or not. SO kindly suggested this question, but it appears to be dealing with a known set of needles to match against the haystack. In my case, I need to compare haystack to haystack, and see matching straws of hay.
Is there a command-line script or command out there, or is this something I need to build? If I build it, should I use the Aho–Corasick algorithm, as suggested in the other question?
So your two questions are 1) Is there already a solution that will do what you want, and 2) Should you use the Aho-Corasick algorithm.
The first answer is that I doubt you'll find a ready-built tool that will meet your needs.
The second answer is that, since you don't care about performance and have a limited CS background, that you should use whatever algorithm you find simplest to implement.
I will go one step further and propose an architecture.
First, you need to be able to parse the .sql files into a meaningful way, one that go line-by-line and return tablename, column_name, and value. A StreamReader is probably best for this.
Second, you need a parser for your webpages that will go element-by-element and return each text node and the name of each parent element all the way up to the html element and its parent filename. An XmlTextReader or similar streaming XML parser, such as SAXON is probably best, as long as it will operate on non-valid XML.
You would need to tie these two parsers together with a mutual search algorithm of some sort. You will have to customize it to suit your needs. Aho-Corasick will apparently get you the best performance if you can pull it off. A naive algorithm is easy to implement, though, and here's how:
Assuming you have your two parsers that loop through each field (on the one hand) and each text node (on the other hand), pick one of the two parsers and have it go through each of the strings in its data source, calling the other parser to search the other data source for all possible matches, and logging the ones it finds.
This cannot work, at least not reliably. Best case: you would fit every piece of data to its counterpart in your HTML files, but you would have many false positives. For example user names that are actual words etc.
Furthermore text is often manipulated before it is displayed. Sites often capitalize titles or truncate texts for preview etc.
AFAIK there is no such tool, and in my opinion there cannot exist one that solves your problem adequately.
Your best choice is to get the source code the site uses/used and analyze it. If that fails/is not possible you have to analyse the database manually. Get as much content as possible from the URL and try to fit the puzzle.
Consider the whole novel (e.g. The Da Vinci Code).
How does e-book reader software process and output the whole book??
Does it put the WHOLE book in one very large string?? array of strings?? Or what??
One of the very first "real" programs I wrote (as part of a class excersise in high school) was a text editor. Part of the requirement for this excersise was for the program to be able to handle documents of arbitrary length (ie larger than the available system memory).
We achieved this by opening the file, but reading only the portion of it required to display the current page of data. When the user moves forward or backward in the file, we read that portion of the file and display it.
We can speed the program up by reading ahead to load pages which we anticipate that the user will want, and by retaining recently read pages in memory so that there is no obvious delay when the user moves forward or backward.
So basically, the answer to your question is: "No. with very large text files, it is unusual to load the whole thing into memory at once. A program that can handle files like that will load it in chunks as it needs to, and drop chunks it doesn't need any more."
Complex document formats (such as ebooks) may have lookup tables built into the file to allow the user to search or jump quickly to a given page or chapter. In this, they effectively work like a database.
I hope that helps.