How to read VSAM file in Python3

How to read VSAM file in Python3 - python-3.x

I have VSAM file in the unix system. I want to read the file using the layout of that file in the python. Out of the .idx and .dta, I copied .dta to my local machine and tried to read using the below code,
infile = open("myfile.dta","r",encoding="ansi")
for line in infile:
print(line)
without the encoding parameter it is giving the error..
"UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1572"
So to solve that error i opened the file in the notepad++ and checked the encoding. Now I can read the file and it displays the data (still I can see few special characters).
Now the main question is how can I read this file record by record as per the provided layout.

There are ports of Python 3 and Python 2 to z/OS. It looks like the Python 3 port does not currently have support for accessing "native" or "classic" z/OS files -- those that do not reside in the z/Unix file system.
VSAM is not a small topic. If you're interested in the history and underlying technologies, feel free to search for "what is VSAM" in your favorite search engine; the TLDR is that VSAM files are analogous to ISAM in that they allow reading of a particular record given a key. VSAM has other capabilities of course, and it is emphatically not ISAM, that's just an analogy.
Depending on the usage pattern for the files in question you may run into some resistance to your access. If these VSAM files are in use by a production CICS region, heavy use from your code may create contention resulting in performance degradation.
Something to consider: you are essentially adding a new requirement to a running production system, doing so requires some analysis to determine the best mechanism to satisfy your requirement without having a negative impact on that existing system. That mechanism will take into account existing shop standards, security, performance, staff time, etc. Maybe that analysis has already taken place (I cannot know if it has) but your question indicates you have a copy of a single VSAM file on your workstation and subsequent comments seem to indicate you wish to access "many such files" in place on z/OS.
As is often the case when non-mainframe developers must access some or all of the data contained in an existing mainframe system, you must discuss your requirements and theirs to come up with a mutually agreeable solution. I have tried to outline some of the issues in this answer, this answer, and this answer to this question which has references to Calcite (with which I have no experience) and NFS Server capabilities of z/OS (with which I also have no experience). Lots of capabilities, lots of options, and I will reiterate here something from more than one of the linked answers:
Please understand there is a big difference between...
what is technically possible
what is allowed in your shop
what is likely to provide a robust and maintainable solution given your requirements
These are three very different things. Some of us have life
experiences that make us reticent about answering questions regarding
what is technically possible absent any mention of what is allowed in
your shop or what the actual business requirement is that is being
solved.
Mainframes have been around for over half a century, and many shops
have standard solutions to technical problems. Sometimes the solution
is "don't do that, and here's what we do instead." Working against
the recommendations of your technical staff, or your shop standards,
is career limiting.
Update 2022-10-29
David Crayford has just published a Python library for z/OS dataset I/O, including VSAM, at https://github.com/daveyc/pyzfile.

You won't be able to read a VSAM file using Python. Perhaps if you call into the C API libraries, but that is doubtful. You can use the Java JZOS api and reach into the MVS side of things. Most z/OS systems have Java installed. If you don't have Java installed... go learn some COBOL.

Related

Organizing Scientific Data and Code - Experiments, Models, Simulation, Implementation

I am working on a robotics research project, and would like to know: Does anyone have suggestions for best practices when organizing scientific data and code? Does anyone know of existing scientific libraries with source that I could examine?
Here are the elements of our 'suite':
Experiments - Two types:
Gathering data from existing, 'natural' system.
Data from running behaviors on robotic system.
Models
Description of dnamical system - dynamics, kinematics, etc
Parameters for said system, some of which are derived from type 1 experiments
Simulation - trying to simulate natural behaviors, simulating behaviors on robots
Implementation - code for controlling the robots. Granted this is a large undertaking and has a large infrastructure of its own.
Some design aspects of our 'suite':
Would be good if simulation environment allowed for 'rapid prototyping' (scripts / interactive prompt for simple hacks, quick data inspection, etc - definitely something hard to incorporate) - Currently satisfied through scripting language (Python, MATLAB)
Multiple programming languages
Distributed, collaborative setup - Will be using Git
Unit tests have not yet been incorporated, but will hopefully be later on
Cross Platform (unfortunately) - I am used to Linux, but my team members use Windows, and some of our tools are wed to that platform
I saw this post, and the books look interesting and I have ordered "Writing Scientific Software", but I feel like it will focus primarily on the implementation of the simulation code and less on the overall organization.

The situation you describe is very similar to what we have in our surface dynamics lab.
Some of the work involves keeping measurements data which are analysed at real time, or saved for late analysis.Some other work, on the other hand, involves running simulations and analysing their results.
The data management scheme, which the lab leader picked up at Cambridge while studying there, is centred around a main server which holds the personal files of all lab members. Each member access the files from his work station by mounting the appropriate server folder using NFS. This has its merits and faults. It is easier to back up everything, but is problematic when processing large amounts of data over the net. For this reason i am an exception in the lab, since the simulation i work with generates a large amount of data. This data is saved on my work station, and only the code used to generate it (source code of the simulation and configuration files) are saved on the server.
I also keep my code in an online SVN service, since i can not log into to lab server from home. This is a mandatory practice, which stems from the need to be able to reproduce older results on demands and trace changes to the code if some obscure bug appears. Hence the need to maintain older versions and configuration files.
We also employ low tech methods, such as lab notebooks to record results, modifications, etc.
This content can sometimes be more abstract (no point describing every changed line in the code - you have diff for this. Just the purpose of the change, perhaps some notes about implementations and its date).
Work is done mostly with Matlab. Again i am an exception, as i prefer Python. I also use C for the data generating simulation. Testing are mostly of convergences, since my project now is concerned with comparing to computational models. I just generate results with different configurations, saved in their own respected folder (which i track in my lab logbook). This has the benefits of being able to control and interface the data exactly as i want to, instead of conforming to someone else's ideas and formats.

FS based on a database without using fuse

To serve millions of files out of a single directory, being able to connect to a drive from hundreds of endpoints, and for some other reasons (to avoid gluster/nfs/all fs based networking solutions), I want to evaluate the possibility of making a filesystem that's based on a mongodb (or any other).
Basically, it works like fusefs, every single file is kept in mongo gridfs. In theory, I do,
mount mongodbfs /mountPoint mongodb://localhost
then when i say touch /mountPoint/test.txt this file is inserted into mongodb. This FS will also store uid/gid and perms with the file, we can throw hundreds of servers to it, and no useradd will be necessary. I'm not thinking to include all the features of FS, just the ones we need.
My question is, how do I start my quest in finding resources, books, links, people, developers who'd help me implement this? at least a proof of concept. Is it feasible? What should I expect as a timeline for such undertaking?
Please only think about gazillion small files and folders.
ps: after a few days of research i think this is the direction i'm heading
http://www.ibm.com/developerworks/library/l-sc12.html
http://www.flipcode.com/archives/Programming_a_Virtual_File_System-Part_I.shtml
ps2: i'm aware of the difficulty of this undertaking. however we're willing to set aside a serious budget and willing to form a serious team implementing it - only after we make sure that this isn't a black hole (thus the question).

Your most frequent piece of advice here is going to be "Use FUSE". This is excellent advice, and you would do well to heed it (As Sciurus pointed out there's already gridfs-fuse which is pretty close to what you want).
That said, if you want to take the long, hard road of pain and suffering (writing your own filesystem), you almost certainly want to take an operating systems course at a local university, or look at some online course materials ("Write a simple FS" is usually a small project. The filesystems typically suck because they're academic toys).
Follow that up with Linux File Systems (Moshe Bar) and a thorough reading of some simple filesystem drivers to see the basic skeleton of what you'll need to do.
As far as timeline, if you're a decent coder you can write a basic filesystem in a few days to a week (but it will SUCK). I wouldn't even guess how long it would take to write a GOOD filesystem -- UFS/FFS (the BSD filesystem) has been under continuous development since at least the late 1970s/early 1980s, and improvements/enhancements/bug fixes still pop up occasionally. Sun/Oracle's ZFS has gone through over 20 iterations in its relative short (6-year) life, though admittedly much of that is related to volume management capabilities.

Finding Vulnerabilities in Software

I'm insterested to know the techniques that where used to discover vulnerabilities. I know the theory about buffer overflows, format string exploits, ecc, I also wrote some of them. But I still don't realize how to find a vulnerability in an efficient way.
I don't looking for a magic wand, I'm only looking for the most common techniques about it, I think that looking the whole source is an epic work for some project admitting that you have access to the source. Trying to fuzz on the input manually isn't so comfortable too. So I'm wondering about some tool that helps.
E.g.
I'm not realizing how the dev team can find vulnerabilities to jailbreak iPhones so fast.
They don't have source code, they can't execute programs and since there is a small number of default
programs, I don't expect a large numbers of security holes. So how to find this kind of vulnerability
so quickly?
Thank you in advance.

On the lower layers, manually examining memory can be very revealing. You can certainly view memory with a tool like Visual Studio, and I would imagine that someone has even written a tool to crudely reconstruct an application based on the instructions it executes and the data structures it places into memory.
On the web, I have found many sequence-related exploits by simply reversing the order in which an operation occurs (for example, an online transaction). Because the server is stateful but the client is stateless, you can rapidly exploit a poorly-designed process by emulating a different sequence.
As to the speed of discovery: I think quantity often trumps brilliance...put a piece of software, even a good one, in the hands of a million bored/curious/motivated people, and vulnerabilities are bound to be discovered. There is a tremendous rush to get products out the door.

There is no efficient way to do this, as firms spend a good deal of money to produce and maintain secure software. Ideally, their work in securing software does not start with a looking for vulnerabilities in the finished product; so many vulns have already been eradicated when the software is out.
Back to your question: it will depend on what you have (working binaries, complete/partial source code, etc). On the other hand, it is not finding ANY vulnerability but those that count (e.g., those that the client of the audit, or the software owner). Right?
This will help you understand the inputs and functions you need to worry about. Once you localized these, you may already have a feeling of the software's quality: if it isn't very good, then probably fuzzing will find you some bugs. Else, you need to start understanding these functions and how the input is used within the code to understand whether the code can be subverted in any way.
Some experience will help you weight how much effort to put at each task and when to push further. For example, if you see some bad practices being used, then delve deeper. If you see crypto being implemented from scratch, delve deeper. Etc

Aside from buffer overflow and format string exploits, you may want to read a bit on code injection. (a lot of what you'll come across will be web/DB related, but dig deeper) AFAIK this was a huge force in jailbreaking the iThingies. Saurik's mobile substrate allow(s) (-ed?) you to load 3rd party .dylibs, and call any code contained in those.

What are some advanced and modern resources on exploit writing?

I've read and finished both Reversing: Secrets of Reverse Engineering and Hacking: The Art of Exploitation. They both were illuminating in their own way but I still feel like a lot of the techniques and information presented within them is outdated to some degree.
When the infamous Phrack Article, Smashing the Stack for Fun and Profit, was written 1996 it was just before what I sort of consider the Computer Security "golden age".
Writing exploits in the years that followed was relatively easy. Some basic knowledge in C and Assembly was all that was required to perform buffer overflows and execute some arbitrary shell code on a victims machine.
To put it lightly, things have gotten a lot more complicated. Now security engineers have to contend with things like Address Space Layout Randomization (ASLR), Data Execution Prevention (DEP), Stack Cookies, Heap Cookies, and much more. The complexity of writing exploits went up at least an order of magnitude.
You can't event run most of the buffer overrun exploits in the tutorials you'll find today without compiling with a bunch of flags to turn off modern protections.
Now if you want to write an exploit you have to devise ways to turn off DEP, spray the heap with your shell-code hundreds of times and attempt to guess a random memory location near your shellcode. Not to mention the pervasiveness of managed languages in use today that are much more secure when it comes to these vulnerabilities.
I'm looking to extend my security knowledge beyond writing toy-exploits for a decade old system. I'm having trouble locating resources that help address the issues of writing exploits in the face of all the protections I outlined above.
What are the more advanced and prevalent papers, books or other resources devoted to contending with the challenges of writing exploits for modern systems?

You mentioned 'Smashing the stack'. Research-wise this article was out-dated before it was even published. The late 80s Morris worm used it (to exploit fingerd IIRC). At the time it caused a huge stir because back then every server was written in optimistic C.
It took a few (10 or so) years, but gradually everyone became more conscious of security concerns related to public-facing servers.
The servers written in C were subjected to lots of security analysis and at the same time server-side processing branched out into other languages and runtimes.
Today things look a bit different. Servers are not considered a big target. These days it's clients that are the big fish. Hijack a client and the server will allow you to operate under that client's credentials.
The landscape has changed.
Personally I'm a sporadic fan of playing assembly games. I have no practical use for them, but if you want to get in on this I'd recommend checking out the Metasploit source and reading their mailing lists. They do a lot of crazy stuff and it's all out there in the open.

I'm impressed, you are a leet hacker Like me. You need to move to web applications. The majority of CVE numbers issued in the past few years have been in web applications.
Read these two papers:
http://www.securereality.com.au/studyinscarlet.txt
http://www.ngssoftware.com/papers/HackproofingMySQL.pdf
Get a LAMP stack and install these three applications:
http://sourceforge.net/projects/dvwa/ (php)
http://sourceforge.net/projects/gsblogger/ (php)
http://www.owasp.org/index.php/Category:OWASP_WebGoat_Project (j2ee)
You should download w3af and master it. Write plugins for it. w3af is an awesome attack platform, but it is buggy and has problems with DVWA, it will rip up greyscale. Acunetix is a good commercial scanner, but it is expensive.

I highly recommend "The Shellcoder's Handbook". It's easily the best reference I've ever read when it comes to writing exploits.
If you're interested writing exploits, you're likely going to have to learn how to reverse engineer. For 99% of the world, this means IDA Pro. In my experience, there's no better IDA Pro book than Chris Eagle's "The IDA Pro Book". He details pretty much everything you'll ever need to do in IDA Pro.
There's a pretty great reverse engineering community at OpenRCE.org. Tons of papers and various helpful apps are available there. I learned about this website at an excellent bi-annual reverse engineering conference called RECon. The next event will be in 2010.
Most research these days will be "low-hanging fruit". The majority of talks at recent security conferences I've been to have been about vulnerabilities on mobile platforms (iPhone, Android, etc) where there are few to none of the protections available on modern OSes.
In general, there won't be a single reference out there that will explain how to write a modern exploit, because there's a whole host of protections built into OSes. For example, say you've found a heap vulnerability, but that pesky new Safe Unlinking feature in Windows is keeping you from gaining execution. You'd have to know that two geniuses researched this feature and found a flaw.
Good luck in your studies. Exploit writing is extremely frustrating, and EXTREMELY rewarding!
Bah! The spam thingy is keeping me from posting all of my links. Sorry!

DEP (Data Execution Prevention), NX (No-Execute) and other security enhancements that specifically disallow execution are easily by-passed by using another exploit techniques such as Ret2Lib or Ret2Esp. When an application is compiled it usually is done so with other libraries (Linux) or DLLs (Windows). These Ret2* techniques simply call an existing function() that resides in memory.
For example, in a normal exploit you may overflow the stack and then take control of the return address (EIP) with the address of a NOP Sled, your Shellcode or an Environmental Variable that contains your shellcode. When attempting this exploit on a system that does not allow the stack to be executable your code will not run. Instead, when you overflow the return address (EIP) you can point it to an existing function within memory such as system() or execv(). You pre populate the required registers with the parameters this function expects and now you can call /bin/sh without having to execute anything from the stack.
For more information look here:
http://web.textfiles.com/hacking/smackthestack.txt

How to manage application resources?

We are developing a web application which is available in 3 languages.
There are these key-value pairs to translate everything. At this moment we use Excel (key, german, french, english) for this. But this does not work well ... if there is more than 1 person editing this file, you have no chance to automatically merge the different files.
Is there a good (and free) tool which can handle this job?
--- additional information ---
(This is a STRUTS application) But the question is how to manage these kinds of information in general (or at least in an conveinient way, which also supports multiple users editing this single file ("mergeable" filetypes))

Why not use gettext and manage separate .po files? See that blog entry.

If you can store this information in plain text then you will be able to use a version control system like subversion to help you with merging changes. Subversion is free.
The free guide (the "Red Book") to subversion gives a fairly good explanation of how this kind of merging works.
http://svnbook.red-bean.com/en/1.5/svn.basic.vsn-models.html#svn.basic.vsn-models.copy-merge
EDIT: Another thought - if you really want to stay using a spreadsheet - Google Docs supports simultaneous editing of a spreadsheet. You could import your existing spreadsheet and get your multi-user merging wishes for free with very little change to how you work.

Good Question.
There are some "Best Practice" depending on what you actually code in (java, ms-windows c#).
I solved this (but I think there must be a better way) by using a SQL db instead of excel file, and a wrote a plug for VS (VB6,........,..., emacs) that was able to insert new keys into the db without going to round trip with version control. The keys are the developers name of what they think is a best guess for a label. (key => save, sv => "spara", no => "", en => "save").
This db can then be generated as a module, class, obj, txt, to appropriate code(platform)
and can be accessed, depending on the ide, so in c#, bt,label = corelang.save;
Someone else can then do all the language stuff, and then we just update the db and rerun the generation to the platform resources.

After years of seeing localization done, including localization at large companies like Sony. I can only say the "standard" is Excel :)
There are tons of good ideas around, and probably many better ways to do it, but in real-life excel seems to be the best/cost effective solution that doesn't require training or making complex new tools to get the job done.

Found out, that Intellij Idea (at leas in version 7 and 8) has an editor for application resources. But it is not free at all. And it does not scale for bigger resource files with more than 1.000 keys.
Another good choice would be to use Google's spreadsheets ... for those who don't know it - it is like an "online Excell web-application". It can handle concurrent access from multiple users. Yay! But sadly, it comes from Google. This makes it impossible to be used in commercial projects.
So,
still searching...
cheers,
mana

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string