This question was migrated from Super User because it can be answered on Stack Overflow.
Migrated 12 days ago.
sorry if this question isn't in the right place, but I wouldn't qualify myself as a developer.
Overview: our organization runs a piece of critical Python code on Google's Collab from Google Drive. This code processes sensitive data and for privacy reasons, we'd like to move it out of Google. There's also a performance reason as the RAM requirements are sometimes higher than allowed by the free plan, halting code execution.
The setup: the code takes one very large PDF file as input (in the same Google Drive directory), processes it then spits out tens of usable XLSX files containing the relevant info, each in a different folder since the code is expected to be run once every 4-5 weeks. The very few parameters allowed are set in a simple text file.
The bug: no one in the organization currently has the skills to edit code and maintain a compatible Python environment.
The ideal end product would be a self-contained executable with no external dependency, able to run in any user directory. I know this is a rather inefficient use of disk space, but the only way to ensure proper execution every time. Primary target platforms would be Mac OS X 10.14 to current, or Debian-based Linux distribution, more precisely Linux Mint. Windows compatibility isn't a priority.
What would be the proper way to make this code independant?
I ask myself a question about the z/os log:
I just would like to know if all the operations getting started were always called by $HASP373 and IEF403I ?
And for the status Ended called by $HASP395 and IEF404I ?
The trouble with z/OS is that it's really hard to explain something without introducing another concept that also needs explaining. This, in turn, requires another explanation etc. This is partly due to the z/OS operating system being from a different planet compared to Unix, Windows, OS X etc, all of which are broadly similar.
Those messages are issued by the system for a lot of the work that happens on a mainframe, but not all of it.
All work on z/OS runs in its own address space, which is almost like a mini-VM. There will be many address spaces in a z/OS system (380 in ours currently). A program in an address space is not aware of any other address spaces and thinks it has access to the entire 2Gb (31-bit addressing) range of memory (different address spaces can communicate if necessary & authorised, and more than 2GB is available with 64-bit addressing). A program in one address space cannot crash a program in another address space by overwriting storage. Programs in 2 different address spaces can access the same memory address, but don't affect each other, as they will actually, unbeknown to them, access different memory.
There are 4 types of address spaces:
TSO (Time Sharing Option) - these are users logged on to the system, typing commands and getting responses. They may run scripts, using the languages REXX and Clist (Command Lists - older, generally replaced by REXX) much like Perl and shell scripts, submit batch jobs, write and compile code etc.
BATCH JOBS (or JOB) - This is where you want to run a program, so you create a text file with the name of the program(s) to run and the file(s) that it/they need(s) and SUBMIT it. The system will run the program(s) and tell you when they are done, Whilst running, you can go and do something else. You don't even need to be logged on - you can prepare an FTP job (for example) to run at 01:00 whilst you're asleep and another job to run if the first one works.
STARTED TASKS (STCs) - Very similar to a batch job. Usually started either by the system itself when it starts or by an operator issuing a START command for that STC at the system console. (E.g. 'START DB2' starts the DB2 started task. Alternatively a user may submit a batch job for their own test DB2 system.)
System Address Spaces (SYSAS). Consider these like a Unix daemon. started by the operating system itself for various essential processes. There are also address spaces representing processes running under the 'Unix' half of z/OS (USS - Uxniz System Services), but that's another story.
There is no such thing as an 'operation' in z/OS terms. Within an address space, many programs may be running, each one identified by a TCB (Task Control Block) or SRB (System Request Block).
However, if you knew that the information you wanted was produced by a normal batch job, then looking for the £HASP373 and £HASP395 messages for that job would be the right place to start. Bear in mind that the message ids (HASP373 and HASP395) might not start with a '£' on your system. '£' is the default, but it is a customisable parameter. $ and # are also fairly common.
I do know what I'm talking about, but if any of the above is not clear, then I haven't explained it very well. I may be guilty of doing exactly what I warned against and explaining an unknown concept by using another unknown concept. :-)
Work gets into z/OS through something called the subsystem interface. Part of this flow is that generally, when an address space is started, it requests work from the subsystem that started the address space through a well-defined interface (IEFSSREQ). This handshake is where things like your HASP messages come from.
Here's a watered down example.
An operator enters a START command from a system console. As part of processing that command, the system creates an address space, and eventually a thread in the new address space says, "ok - I'm ready...give me some work to do". This goes to the primary job entry subsystem, who hands the address space something to do - the internal data structures representing the task that the operator started in this case. As part of this chain, the various $HASP messages are issued, and this works pretty much the same way for TSO sessions, started tasks (STCs) and JCL submitted for a batch job.
JES2/JES3 are examples of subsystems, but there are others.
For example, if our operator added the SUB=MSTR parameter on the start command, the requests wouldn't go through the primary JES - and so there wouldn't be any of the $HASP messages you're looking for. There are plenty of vendor applications that start and manage address spaces outside of JES, and this is the stuff you miss by limiting yourself to the HASP and IEF401 messages.
Also, UNIX Services has a variety of APIs similar to UNIX "fork" that can be used to spawn address spaces without necessarily involving JES.
If you want to know about activity starting and ending, there are better ways - SMF, ENF signals, etc. A great way to learn this stuff if you don't know already is to use the system trace facilities and read some dumps. The wonderful thing about z/OS is that it's all right there, for those who spend the time figuring out where to look.
No. Those messages are for jobs. Not all operations are jobs. An example of an operation that is not a job would be a system command. I don't have a z/OS system at hand right now, but I believe another example of an operation that would not use the messages you reference would be a started task.
This may be helpful, as it attempts to explain z/OS concepts in Unix terms.
A job is something that goes through JES2/JES3. (In your case, JES2.) JES2/JES3 jobs are generally used for batch type of work. For example, a sort job, where I submit something, and come back later and get an answer. However, there's a lot of work running under z/OS that doesn't go through JES2/JES3.
Part of the problem here is what you mean by an operation; for example, while you may get a message saying that DB2 has started, after it's started, it's not going to tell you every time it gets a query. A TSO user might run a REXX exec underneath his/her address space, but that's not going to go through JES.
Another way to look at this is that JES2/JES3 are job management subsystems, but they aren't equivalent to the kernel on a unix/windows system, which does schedule all the work running on the system. For z/OS, there are multiple ways that work can come in to a system; examples include JES2/JES3, TSO, ISPF, CICS, DB2, IMS, via the console, etc. It's then up to the master scheduler/WLM/SRM to manage all the requests that come in through all of the subsystems.
If you have access to a z/OS system, look into SDSF, or whatever you use to manage JES2. The ST panel, under SDSF, is a list of things that are running/eligible to run that are managed by JES2. However, if you look at the DA panel (assuming you have authority to do so), you'll note that there are a lot of address spaces that show up on the DA panel that don't show up in the ST panel.
If address spaces are started through the JES2-subsystem, which is normally the case unless another subsystem or MSTR is specified using the MVS START command, then the $HASP373 jobname STARTED is issued. Similarly, when the address space ends, message $HASP395 is issued.
The IEF403I and IEF404I messages are issued by the system in similar situations and independent of what either JES2 or JES3 are doing and regardless under what subsystem the address space was started. The messages are only issued when the operator has requested to monitor job names using the SETCON MONITOR or the MONITOR JOBNAMES command. Products for automated operations typically do this.
How can you profile a very long running script that is spawning lots of other processes?
We have a job that takes a long time to run - 11 or more hours, sometimes more than 17 - so it runs on a Amazon EC2 instance.
(It is doing cufflinks DNA alignment and stuff.)
The job is executing lots of processes, scripts and utilities and such.
How can we profile it and determine which component parts of the job take the longest time?
A simple CPU utilisation per process per second would probably suffice. How can we obtain it?
There are many solutions to your question :
munin is a great monitoring tool that can scan almost everything in your system and make nice graph about it :). It is very easy to install and use it.
atop could be a simple solution, it can scan cpu, memory, disk regulary and you can store all those informations into files (the -W option), then you'll have to anaylze those file to detect the bottleneck.
sar, that can scan more than everything on your system, but a little more hard to interpret (you'll have to make the graph yourself with RRDtool for example)
So I was looking into why a program was getting rid of my background, and the author of the program said to post .xsession-errors and many people did. Then my next question was: What is .xsession-errors? A google search reveals many results but nothing explaining what it is.
What I know so far:
It's some kind of error log. I can't figure out what it's related too (ubuntu itself? programs?)
I have one and it seems like all Ubuntu systems have it, though I cannot verify.
Linux graphical interfaces (such as GNOME) provide a way to run applications by clicking on icons instead of running them manually on the command-line. However, by doing so, output from the command-line is lost - especially the error output (STDERR).
To deal with this, some display managers (such as GDM) pipe the error output to ~/.xsession-errors, which can then be used for debugging purposes. Note that since all applications launched this way dump to the same log, it can get quite large and difficult to find specific messages.
Update: Per the documentation:
The ~/.xsession-errors X session log file has been deprecated and is
no longer used.
It has been replaced by the systemd journal (journalctl command).
It's the error log produced by your X windows system (which the Ubuntu GUI is built on top of).
Basically it's quite a low level error log for X11.
From what I have read being able to use pump mode with distcc requires that you encapsulate make in the pump script. However, I do not have it in my path and I can not find not find it as a package or included in the distcc package for Cygwin.
However, when I compile with distcc and use distccmon-text to monitor which hosts are contacted and their phase I clearly see that some of them, sometimes, are in the Preprocess phase. I thought all preprocessing was done on the client executing the make script when not using pump mode. And that the whole idea of pump mode was preprocessing on the remote hosts (and thus requireing the same include files).
This has left me confused. My main question is: Exactly what does the phases: Startup, Blocked, Connected, Preprocess, Conect, Send, Receive and Done of distcc mean?
And as a sub-question: How can I use pump mode with distcc in Cygwin?
Exactly what does the phases of distcc
mean?
Ok, this is embarrassing, but I just wasted four hours on the web trying to answer that question. Next time I'll just pull the source and look at it. But, you raise a good point: It's amazing this isn't readily available information.
THESE ARE MY GUESSES (because I don't want to admit I wasted four hours not answering the question!):
Startup - could otherwise be called
"initialization/loading", not yet
ready for first task
Blocked - is awaiting access to local
file or local processor, I stumbled
across recent bug fixes that set "timeout" to one second while it waited for the
processor to become available, and I'm aware that it uses zero-length "flock" files to block at times
Connected - process initiated contact
with a client, is now reserved for a
job (??), or is compiling a job (??)
Preprocess - is performing preprocess
operation
Conect - is hand-shaking with client for atomic operation, maybe to become reserved
(??)
Send - is sending compiled object
file back to client
Receive - is receiving source to be
compiled, or receiving zipped headers
(if using pump-mode)
Done - could otherwise be called
"idle/available"
NOTE: Because of Google's "pump-mode" algorithm, there's actually quite a bit of "hand-shaking" that goes on between the client (running distcc) and volunteer (running distccd). First, in pump-mode, all the headers that are expected to be "needed" are bundled, zipped, and pushed to the volunteer (where it is unzipped into a local mirror like that on the client machine). However, it appears that some further communication between the volunteer-and-client is possible to incrementally transfer other headers as-needed, so that would explain the "more-rich" communications phases/states listed above.
Am I already using pump mode?
I very much doubt it, as you did not configure it by wrapping the compile option through make or scons (necessary to run the Google algorithm to predict header usage for bundling-and-transport to the volunteer), nor can you find the "pump" script. But, I cannot explain your seeing the "Preprocess" states on your volunteers. (You're not referring to the "Preprocess" state on your clients, right? That would be entirely understandable, as by default preprocessing is on the client.)
Rather, I suppose the implementation makes it possible that the hard-state-machine would move through ALL the states, INCLUDING "preprocessing", even when there is no preprocessing to do, before it advances to the next state. For example, even if it did no preprocessing on the volunteer side, distccd would receive the source file, write it to disk, and then launch the compiler. If you're on Cywin, those are not instantaneous, especially if it is a large source file (especially after all the headers included in it). So, you might see the "preprocess" phase until it manually initiates the next phase for the compile operation itself.
HEY ... I don't see an obvious "compile" phase, so it is POSSIBLE that the "preprocess" phase embodies "compile" or "preprocess-and-compile" (since those phases are often combined in many compilers anyway).
Sorry -- I'm just guessing.
And how do I use pump mode in Cygwin?
I haven't tried it, but it is supposed to be possible. Apparently the most common problem with Cygwin is that some Windows compilers cannot handle the default TMPDIR setting when distcc is run under Cygwin. The fix is to put something like export TMPDIR=c:/temp in /etc/profile
The FAQ may be able to help more: http://distcc.googlecode.com/svn/trunk/doc/web/faq.html