Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have some confusions regarding VSAM as I am new to it. Do correct me where I am wrong and solve the queries.
A cluster contains control areas and a control area contains control intervals. One control interval contains one dataset. Now for defining a cluster we mention a data component and index component. Now this name of data component that we gives creates a dataset and name of index generates a key. My queries are as follows-
1)If I have to add a new record in that dataset, what is the procedure?
2)What is the procedure for creating a new dataset in control area?
3)How to access a dataset and a particular record after they are created?
I tried finding a simple code but was unable so kindly explain with a simple example.
One thing that is going to help you is the IBM Redbook VSAM Demystified: http://www.redbooks.ibm.com/abstracts/sg246105.html which, these days, you can even get on your smartphone, amongst several other ways.
However, your current understanding is a bit astray so you'll need to drop all of that understanding first.
There are three main types of VSAM file and you'll probably only come across two of those as a beginner: KSDS; ESDS.
KSDS is a Key Sequenced Data Set (an indexed file) and ESDS is an Entry Sequenced Data Set (a sequential file but not a "flat" file).
When you write a COBOL program, there is little difference between using an ESDS and a flat/PS/QSAM file, and not even that much difference when using a KSDS.
Rather than providing an example, I'll refer you to the chapter in the Enterprise COBOL Programming Guide for your release of COBOL, it is Chapter 10 you want, up to and including the section on handling errors, and the publication can be found here: http://www-01.ibm.com/support/docview.wss?uid=swg27036733, you can also use the Language Reference for the details of what you can use with VSAM once you have a better understanding of what it is to COBOL.
As a beginning programmer, you don't have to worry about what the structure of a VSAM dataset is. However, you've had some exposure to the topic, and taken a wrong turn.
VSAM datasets themselves can only exist on disk (what we often refer to as DASD). They can be backed-up to non-DASD, but are only directly usable on DASD.
They consist of Control Areas (CA), which you can regard as just being a lump of DASD, and almost exclusively that lump of DASD will be one Cylinder (30 Tracks on a 3390 (which these days is very likely emulated 3390). You won't need to know much more about CAs. CAs are more of a conceptual thing that an actual physical thing.
Control Intervals (CI) are where any data (including index data) is. CIs live in CAs.
Records, the things you will have in the FILE SECTION under an FD in a COBOL program, will live in CIs.
Your COBOL program needs to know nothing about the structure of a VSAM dataset. COBOL uses VSAM Access Method Services (AMS) to do all VSAM file accesses, as far as your COBOL program is concerned it is an "indexed" file with a little bit on the SELECT statement to say that it is a VSAM file. Or is is a sequential file with a little... you know by now.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm working on an ETL pipeline with Kiba which imports into multiple, related models in my Rails app. For example, I have records which have many images. There might also be collections which contain many records.
The source of my data will be various, including HTTP APIs and CSV files. I would like to make the pipeline as modular and reusable as possible, so for each new type of source, I only have to create the source, and the rest of the pipeline definition is the same.
Given multiple models in the destination, and possibly several API calls to get the data from the source, what's the standard pattern for this in Kiba?
I could create one pipeline where the destination is 'the application' and has responsibility for all these models, this feels like the wrong approach because the destination would be responsible for saving data across different Rails models, uploading images etc.
Should I create one master pipeline which triggers more specific ones, passing in a specific type of data (e.g. image URLs for import)? Or is there a better approach than this?
Thanks.
Kiba author here!
It is natural & common to look for some form of genericity, modularity and reusability in data pipelines. I would say though, that like for regular code, it can be hard initially to figure out what is the correct way to get that (it will depend quite a bit on your exact situation).
This is why my recommendation would be instead to:
Start simple (on one specific job)
Very important: make sure to implement end-to-end automated tests (use webmock or similar to stub out API requests & make tests completely isolated, create tests with 1 row from source to destination) - this will make it easy to refactor stuff later
Once you have that (1 pipeline with tests), you can start implementing a second one, and refactor to extract interesting patterns as reusable bits, and iterate from there
Depending on your exact situation, maybe you will extract specific components, or maybe you will end up extracting a whole generic job, or generic families of jobs etc.
This approach works well even as you get more experience working with Kiba (this is how I gradually extracted the components that you will find in kiba-common and kiba-pro, too.
how to extract the data from mainframe into excel? Currently , I am fetching data from MS access but the requirements are for Mainframe.
Thanks in advance
First, please understand that saying "extract data from mainframe" is similar to saying "extract data from Intel." The following is not comprehensive but is intended to provide an idea of how to ask your question in a manner which can be meaningfully answered.
Please understand there is a big difference between...
what is technically possible
what is allowed in your shop
what is likely to provide a robust and maintainable solution given your requirements
These are three very different things. Some of us answering questions here on Stack Overflow have life experiences that make us reticent about answering questions regarding what is technically possible absent any mention of what is allowed in your shop or what the actual business requirement is that is being solved.
Mainframes have been around for over half a century, and many shops have standard solutions to technical problems. Sometimes the solution is "don't do that, and here's what we do instead." Working against the recommendations of your technical staff, or your shop standards, is career limiting.
What operating system?
z/OS is in common use on mainframes, but there do exist shops that still run one of its ancestors like MVS/XA. The mainframe operating system traces its roots back to OS/360 first available in 1965.
z/TPF
z/Linux usually runs on top of the z/VM hypervisor.
z/VSE
In what sort of file does the data reside?
QSAM or Queued Sequential Access Method, also commonly called flat files.
VSAM or Virtual Sequential Access Method. There are several different kinds of VSAM files including KSDS (Keyed Sequential Data Set) ESDS (Entry Sequenced Data Set), RRDS (Relative Record Data Set) and Linear (conceptually similar to a memory mapped file).
a DBMS like DB2 or IMS. A DBMS typically has extract facilities to allow writing a flat file from its own internal format. DB2, for example, stores data in Linear VSAM datasets.
Unix System Services files reside in a different file system than QSAM or VSAM. This will be more familiar, as it has a directory structure where the classic z/OS file system has none.
What does the data look like?
You must know the record layout of the data you wish to retrieve.
It is common for mainframe data to include both text and binary data in a single record, for example a name and a currency amount:
Hopper Grace ar%
...which would be...
x'C8969797859940404040C799818385404040404081996C'
...in hex. This is code page 37, commonly referred to as EBCDIC.
Without knowing that the family name is confined to the first 10 bytes, the given name confined the the subsequent 10 bytes, and the currency amount is in packed decimal (also known as binary coded decimal) in the next 3 bytes, you cannot accurately transfer the data because code page conversion will destroy the currency amount which is +819.96. Converting to code page 1250, commonly in use on Microsoft Windows, you would end up with...
x'486F707065722020202047726163652020202020617225'
...where the text data is translated but the packed data is destroyed. The packed data no longer has a valid sign in the last nibble (the lower half of the last byte) and the amount itself has been changed.
Security
Is the data you wish to access covered by privacy legislation? You may have to provide some evidence that whatever protections are in place to guarantee that only authorized personnel have access to this data on the mainframe are also in place once you have transferred it off of the mainframe. Such guarantees may have to satisfy an auditor.
What you need
You need to know what operating system holds your data, you need to know what type of file holds your data (a DBMS isn't a type of file but let's let that go for now), and you need to know your record layout(s).
Typically, the easy way to retrieve data is to extract it from its existing data store (QSAM, VSAM, DBMS) into a flat file where all the data is in a text format. There are mainframe utilities to accomplish this. In extreme cases, a program can be written to accomplish this goal. Once it has been accomplished, you can transfer your data without fear of destroying packed or binary data.
You may be able to read data directly from a DBMS if that's where your data resides, but this may depend on shop standards, including security.
Modern mainframes can transfer data via FTP, FTPS, and SFTP. Which is recommended in your shop is something to ask your technical staff.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
(not sure if this is the right forum for this question)
I am very curious about how search in major site, say youtube/quora/stackexcahnge, works?
And I'm NOT looking for an answer like 'They Use Lucene Search engine'. I want to understand exactly how the indexing works there.
Is there a different Index for text search than the autocomplete feature?
Is it done in the background like map reduce.
How exactly does map reduce help deliver results? (I know that it counts words in each document but what happens after that when I search for a keyword?)
I also heard that google stopped using map reduce and now using cloud dataFlow here - how does that work?
Help Please :-)
I voted to close, because I think your question is too broad. Each bullet could form the basis of an SO question. That stated, I'll take a crack at answer how SolrCloud attempts to solve each of the problems you are asking about:
Is there a different Index for text search than the autocomplete feature?
The short answer is "yes". Solr has several options for implementing an autocomplete feature and all of them rely on either building a separate index or being supplied a separate dictionary. You can also roll your own in an even more sophisticated fashion as the blog post "Super flexible AutoComplete with Solr" demonstrates.
Is it done in the background like map reduce?
Generally speaking no. SolrCloud is based on the idea of shards with leaders and replicas. A shard being a subset of your overall index. With a shard being comprised of a leader and possibly one or more replicas.
Queries are executed against all shard leaders. With assigning a particular shard to serve as the aggregator of each shard's response, but unlike map reduce where the individual node responses have all the data the reducing node needs, the aggregating Solr shard may make multiple requests back to the other shards to figure out sort order - for example.
How exactly does map reduce help deliver results? (I know that it counts words in each document but what happens after that when I search for a keyword?)
See my response to your previous question. In short the query is executed against each shard, aggregated by one of those shards, and returned to the requestor. What Solr does - Lucene really - that's the useful magic part that people most often associate with it is Term Frequency Inverse Document Frequency indexing usually with stemming on text searches. While this is not exactly what happens under the hood, and you can vary what's actually done via configuration, it provides a fairly good idea of what's being done.
Other searching, on dates and numbers, or simple textual values is done in a fashion similar to database indexing. That is a simplification, if you want to understand it more fully read the JavaDoc on NumericRangeQuery for an in-depth explanation.
I also heard that google stopped using map reduce and now using cloud dataFlow here - how does that work?
If I knew the answer to that I would probably be working for Google and not answering StackOverflow questions :). Seriously whatever they've built is new PhD level work that as far as I know they haven't even release a research paper on, which is what they did with map reduce that led to Yahoo building Hadoop.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm working on a large project for a university assignment, we're developing an application that is used by a business to compile quotes for their various services.
I need to document the algorithms in a way that the client can sign off on to make sure the way we calculate the prices is correct
So far I've tried using a large flow chart with decisions diamonds like in information systems modelling but it's proving to be overkill for even simple algorithms.
Can anybody please suggest some ways to do this? It needs to be as little like software code as possible, and enough for the client to see how we decide what prices are quoted
Maybe you should then use pseudocode.
Create two documents.
First: The business process model (BPM) that shows the sequence of steps required to be done. This should be annotated with the details for each step.
Second: Create a spreadsheet with each input data item defined so that business can see that you understand the type of field for entry of each data point and the rules for each data point. If the calculation uses a table for the step, then that is where you define the input lookup value from the table. So for each step you know where the data is coming from and then going to. Your spreadsheet can include the link to the BPM so they can walk through each data point in the BPM and see where it is coming from/going to.
You can prepare screen designs to show the users how your system is doing actually.
Well, the usual way to document algorithms is writing papers.
If your clients have studied business, I'm sure they are familiar with reading formulas.
Would a data flow diagrams help? Put psuedo code or math in the bubbles. I've had some success combining data flow models and entity relationship diagrams, but it's non standard.
What about Nassi-Shneiderman-Diagram, it's a diagram from structural programming. I think its good to show decision flows.
http://en.wikipedia.org/wiki/Nassi%E2%80%93Shneiderman_diagram
You could create an algorithm test screen to display and comment on the various steps through the calculations.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have been thinking about setting up some sort of library for all our internally developed software at my organisation. I would like collect any ideas the good SO folk may have on this topic.
I figure, what is the point in instilling into developers the benefits of writing reusable code, if on the next project the first thing developers do is file -> new due to a lack of knowledge of what code is already out there to be reused.
As an added benefit, I think that just by having a library like this would encourage developers to think more in terms of reusability when writing code
I would like to keep this library as simple as possible, perhaps my only two requirements being:
Search facility
Usable for many types of components: assemblies, web services, etc
I see the basic information required on each asset/component to be:
Name & version
Description / purpose
Dependencies
Would you record any more information?
What would be the best platform for this i.e., wiki, forum, etc?
What would make a software library like this successful vs unsuccessful?
All ideas are greatly appreciated.
Thanks
Edit:
Found these similar questions after posting:
How do you ensure code is reused correctly?
How do you foster the use of shared components in your organization?
Sounds like there is no central repository of code available at your organization. Depending on what you do this could be because of compatmentalization of the knowledge due to security restrictions, the fact that external vendor code is included in some/all of the solutions, or your company has not yet seen the benefits of getting people to reuse, refactor, and evangelize the benefits of such a repository.
The common attributes of solutions I have seen work at mutiple corporations are a multi pronged approach.
Buy in at some level from the management. Usually it's a CTO/CIO that the idea resonates with and they claim it's a good thing and don't give any money to fund it but they won't sand in your way if they are aware that someone is going to champion the idea before they start soliciting code and consolidating it somewhere.
Some list of projects and the collateral available in english. Seen this on wikis, on sharepoint lists, in text files within a source repository. All of them share the common attribute of some sort of front end search server that allows full text over the description of a solution.
Some common share or repository for the binaries and / or code. Oftentimes a large org has different authentication/authorization methods for many different environments and it might not be practical (or possible logistically) to share a single soure repository - don't get hung up on that aspect - just try to get it to the point that there is a well known share/directory/repository that works for your org.
Always make sure there is someone listed as a contact - no one ever takes code and runs it in production without at lest talking to the previous owner of it - and if you don't have a person they can start asking questions of right away then they might just go ahead and hit file->new.
Unsuccessful attributes I've seen?
N submissions per engineer per time period = lots of crap starts making it's way in
No method of rating / feedback. If there is no means to favorite/rate/give some indicator that allows the cream to rise to the top you don't go back to search it often because you weren't able to benefit from everyone else's slogging through the code that wasn't really very good.
Lack of feedback/email link that contacts the author with questions directly into their email.
lack of ability to categorize organically. Every time there is some super rigid hierarchy or category list that was predetermined everything ends up in "other". If you use tags or similar you can avoid it.
Requirement of some design document to accompany it that is of a rigid format the code isn't accepted - no one can ever agree on the "centralized" format of a design doc and no one ever submits when this is required.
Just my thinking.