Delete PDS member in z/OS USS? - mainframe

Is there a way to delete a PDS member in z/OS USS without getting an ENQ on the whole PDS? I tried "tso delete", but it complained about data set contention.

One way would be to use the ISPF delete service...it generally allocates the dataset as shared, using it's own internal ENQ to serialize deletes. See here. In UNIX Services, you can create a REXX script that the USS shell can run, and it can call things like ISPEXEC as you see in the link.
You can also use IDCAMS. First allocate the PDS dataset - something like alloc fi(pds) dataset(dsn) shr. Then, IDCAMS with DELETE 'pds(member)' FILE(pds) would also do what you want.
There are lots of other ways - the key is generally allocating the PDS with DISP=SHR, opening the PDS for output, and then calling STOW with the DELETE option to delete the member you want.
The serialization is important though - keep in mind that opening a PDS for output under a shared allocation can cause corruption, depending on what you're doing. The ISPF services serialize using an ENQ (SPFEDIT) that gives you finer-grained serialization than allocating with DISP=OLD...DISP=OLD persists so long as the dataset is allocated, while the SPFEDIT ENQ is only there for the fraction of a second it takes to perform the DELETE.

Related

If I private-`mmap` a file and read it, then another process writes to the same file, will another read at the same location return the same value?

(Context: I'm trying to establish which sequences of mmap operations are safe from the "memory safety" point of view, i.e. what assumptions I can make about mmaped memory without risking security bugs as a consequence of undefined behaviour, or miscompiles due to compilers making incorrect assumptions about how memory could behave. I'm currently working on Linux but am hoping to port the program to other operating systems in the future, so although I'm primarily interested in Linux, answers about how other operating systems behave would also be appreciated.)
Suppose I map a portion into file into memory using mmap with MAP_PRIVATE. Now, assuming that the file doesn't change while I have it mapped, if I access part of the returned memory, I'll be given information from the file at that offset; and (because I used MAP_PRIVATE) if I write to the returned memory, my writes will persist in my process's memory but will have no effect on the underlying file.
However, I'm interested in what will happen if the file does change while I have it mapped (because some other process also has the file open and is writing to it). There are several cases that I know the answers to already:
If I map the file with MAP_SHARED, then if any other process writes to the file via a shared mmap, my own process's memory will also be updated. (This is the intended behaviour of MAP_SHARED, as one of its intended purposes is for shared-memory concurrency.) It's less clear what will happen if another process writes to the file via other means, but I'm not interested in that case.
If the following sequence of events occurs:
I map the file with MAP_PRIVATE;
A portion of the file I haven't accessed yet is written by another process;
I read that portion of the file via my mapping;
then, at least on Linux, the read might return either the old value or the new value:
It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.
— man 2 mmap on Linux
(This case – which is not the case I'm asking about – is covered in this existing StackOverflow question.)
I also checked the POSIX definition of mmap, but (unless I missed it) it doesn't seem to cover this case at all, leaving it unclear whether all POSIX systems would act the same way.
Linux's behaviour makes sense here: at the time of the access, the kernel might have already mapped the requested part of the file into memory, in which case it doesn't want to change the portion that's already there, but it might need to load it from disk, in which case it will see any new value that may have been written to the file since it was opened. So there are performance reasons to use the new value in some cases and the old value in other cases.
If the following sequence of events occurs:
I map the file with MAP_PRIVATE;
I write to a memory address within the file mapping;
Another process changes that part of the file;
then although I don't know this for certain, I think it's very likely that the rule is that the memory address in question continues to reflect the old value, that was written by our process. The reason is that the kernel needs to maintain two copies of that part of the file anyway: the values as seen by our process (which, because it used MAP_PRIVATE, can write to its view of the file without changing the underlying file), and the values that are actually in the file on disk. Writes by other processes obviously need to change the second copy here, so it would be bizarre to also change the first copy; doing so would make the interface less usable and also come at a performance cost, and would have no advantages.
There is one sequence of events, though, where I don't know what happens (and for which the behaviour is hard to determine experimentally, given the number of possible factors that might be relevant):
I map the file with MAP_PRIVATE;
I read some portion of the file via the mapping, without writing;
Another process changes part of the file that I just read;
I read the same portion of the file via the mapping, again.
In this situation, am I guaranteed to read the same data twice? Or is it possible to read the old data the first time and the new data the second time?

Persistant storage values handling in Linux

I have a QSPI flash on my embedded board.
I have a driver + process "Q" to handle reading and writing into.
I want to store variables like SW revisions, IP, operation time, etc.
I would like to ask for suggestions how to handle the different access rights to write and read values from user space and other processes.
I was thinking to have file for each variable. Than I can assign access rights for those files and process Q can change the value in file if value has been changed. So process Q will only write into and other processes or users can only read.
But I don't know about writing. I was thinking about using message queue or zeroMQ and build the software around it but i am not sure if it is not overkill. But I am not sure how to manage access rights anyway.
What would be the best approach? I would really appreciate if you could propose even totally different approach.
Thanks!
This question will probably be downvoted / flagged due to the "Please suggest an X" nature.
That said, if a file per variable is what you're after, you might want to look at implementing a FUSE file system that wraps your SPI driver/utility "Q" (or build it into "Q" if you get to compile/control source to "Q"). I'm doing this to store settings in an EEPROM on a current work project and its turned out nicely. So I have, for example, a file, that when read, retrieves 6 bytes from EEPROM (or a cached copy) provides a MAC address in std hex/colon-separated notation.
The biggest advantage here, is that it becomes trivial to access all your configuration / settings data from shell scripts (e.g. your init process) or other scripting languages.
Another neat feature of doing it this way is that you can use inotify (which comes "free", no extra code in the fusefs) to create applications that efficiently detect when settings are changed.
A disadvantage of this approach is that it's non-trivial to do atomic transactions on multiple settings and still maintain normal file semantics.

How can I increase the length of variable-length record beyond 32760?

The maximum record-length for variable-length QSAM records is 32,760 bytes.
The current record-length of our file is OK for us, but in order to tackle some more info we have to expand this file which will have it's length beyond 32K (LRECL > 32760).
Splitting the record is not good option for us as it will impact our existing system.
I'm not sure whether using SPANNED records with VSAM here will solve this problem.
//DEFINE EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=A
//SYSIN DD *
DEFINE CLUSTER (NAME(dsname.K1719) INDEXED VOLUMES(xxxxxx) -
TRACKS(1) KEYS(17 19) RECORDSIZE(40 110) SPANNED) -
DATA (NAME(dsname.K1719.DATA)) INDEX (NAME(dsname.K1719.INDEX))
/*
//
Will this will solve our problem?
If you use Unix System Services files, you are not subject to the 32K limitation on LRECL. There are downstream effects.
If you are using COBOL to process the file you can use LINE
SEQUENTIAL in the ORGANIZATION clause, but then you are limited to 1M
LRECL.
If you are using COBOL to process the file you can eschew COBOL I/O
and use C fopen() and so forth to get around the 1M LRECL
limitation mentioned above, but then you are adding something a bit
foreign to an admittedly hypothetical COBOL application. C would
have no trouble with such files, I cannot speak to PL/I.
Not all DFSMS and third-party utilities are completely conscious of Unix System Services
files.
JCL constructs for Unix System Services files have a relatively short
learning curve, but there is a bit of learning required.
Security for Unix System Services files may be off-putting to your
Security Administrator(s). You may find yourselves having to set up
Access Control Lists via setfacl and other new concepts.
#cschneid answer is interesting.
With out knowing more about the file / application it is difficult to give specific answers. Following are idea's that may be useful.
You could:
break the copybook up into several sub records
add an extra byte to the key
so instead of having
<key><a record>
you would have
<key><sub-key-1><part-1-of-record>
<key><sub-key-2><part-2-of-record>
<key><sub-key-3><part-3-of-record>
...
If you have a single File-Driver program that interacts with the file,
you can hide the details of how the data is stored from your application. So you can have multiple physical records for each logical record and your application does nod need to know about it. You can also have multiple files if you need to as well
Remember you can add space at the end of the new copybook that does not need to be stored in the file. This comes in handy when you need to add fields to the file - can save recompiling lots of programs
Does the file hold logically different data (say sales and orders etc), does every program that accesses the file use the whole record ???
If so, you could add a request-type to the call of the File-Driver-Program:
Call "FilePgm" using GET-ORDER Order-Copybook
or
Call "FilePgm" using GET-SALES Sales-Copybook
or
Call "FilePgm" using GET-EVERYTHING Everything-Copybook
The advantage in having multiple call types/copybooks is if the Order-Copybook changes, programs only using the Sales-Copybook do not need to be recompiled and vice versa. This will make changing the file easier in the future.
Finally there is Database Option !!!, either a
a full rewrite
or the use of binary-blobs. These have a limit of 2gb on binary blobs
With DB2 you can use Compression which can be useful. Large record often have high compression ratio's. The advantage with compression is not the space savings but reduced IO.

How to portably extend a file accessed using mmap()

We're experimenting with changing SQLite, an embedded database system,
to use mmap() instead of the usual read() and write() calls to access
the database file on disk. Using a single large mapping for the entire
file. Assume that the file is small enough that we have no trouble
finding space for this in virtual memory.
So far so good. In many cases using mmap() seems to be a little faster
than read() and write(). And in some cases much faster.
Resizing the mapping in order to commit a write-transaction that
extends the database file seems to be a problem. In order to extend
the database file, the code could do something like this:
ftruncate(); // extend the database file on disk
munmap(); // unmap the current mapping (it's now too small)
mmap(); // create a new, larger, mapping
then copy the new data into the end of the new memory mapping.
However, the munmap/mmap is undesirable as it means the next time each
page of the database file is accessed a minor page fault occurs and
the system has to search the OS page cache for the correct frame to
associate with the virtual memory address. In other words, it slows
down subsequent database reads.
On Linux, we can use the non-standard mremap() system call instead
of munmap()/mmap() to resize the mapping. This seems to avoid the
minor page faults.
QUESTION: How should this be dealt with on other systems, like OSX,
that do not have mremap()?
We have two ideas at present. And a question regarding each:
1) Create mappings larger than the database file. Then, when extending
the database file, simply call ftruncate() to extend the file on
disk and continue using the same mapping.
This would be ideal, and seems to work in practice. However, we're
worried about this warning in the man page:
"The effect of changing the size of the underlying file of a
mapping on the pages that correspond to added or removed regions of
the file is unspecified."
QUESTION: Is this something we should be worried about? Or an anachronism
at this point?
2) When extending the database file, use the first argument to mmap()
to request a mapping corresponding to the new pages of the database
file located immediately after the current mapping in virtual
memory. Effectively extending the initial mapping. If the system
can't honour the request to place the new mapping immediately after
the first, fall back to munmap/mmap.
In practice, we've found that OSX is pretty good about positioning
mappings in this way, so this trick works there.
QUESTION: if the system does allocate the second mapping immediately
following the first in virtual memory, is it then safe to eventually
unmap them both using a single big call to munmap()?
2 will work but you don't have to rely on the OS happening to have space available, you can reserve your address space beforehand so your fixed mmapings will always succeed.
For instance, To reserve one gigabyte of address space. Do a
mmap(NULL, 1U << 30, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
Which will reserve one gigabyte of continuous address space without actually allocating any memory or resources. You can then perform future mmapings over this space and they will succeed. So mmap the file into the beginning of the space returned, then mmap further sections of the file as needed using the fixed flag. The mmaps will succeed because your address space is already allocated and reserved by you.
Note: linux also has the MAP_NORESERVE flag which is the behavior you would want for the initial mapping if you were allocating RAM, but in my testing it is ignored as PROT_NONE is sufficient to say you don't want any resources allocated yet.
I think #2 is the best currently available solution. In addition to this, on 64bit systems you may create your mapping explicitly at an address that OS would never choose for an mapping (for example 0x6000 0000 0000 0000 in Linux) to avoid the case that OS cannot place the new mapping immediatly after the first one.
It is always safe to unmap mutiple mappinsg with a single munmap call. You can even unmap a part of the mapping if you wish to do so.
Use fallocate() instead of ftruncate() where available. If not, just open file in O_APPEND mode and increase file by writing some amount of zeroes. This greatly reduce fragmentation.
Use "Huge pages" if available - this greatly reduce overhead on big mappings.
pread()/pwrite()/pwritev()/preadv() with not-so-small block size is not slow really. Much faster than IO can actually be performed.
IO errors when using mmap() will generate just segfault instead of EIO or so.
The most of SQLite WRITE performance problems is concentrated in good transactional use (i.e. you should debug when COMMIT actually performed).

store some data in the struct inode

Hello I am a newbie to kernel programming. I am writing a small kernel module
that is based on wrapfs template to implement a backup mechanism. This is
purely for learning basis.
I am extending wrapfs so that when a write call is made wrapfs transparently
makes a copy of that file in a separate directory and then write is performed
on the file. But I don't want that I create a copy for every write call.
A naive approach could be I check for existence of file in that directory. But
I think for each call checking this could be a severe penalty.
I could also check for first write call and then store a value for that
specific file using private_data attribute. But that would not be stored on
disk. So I would need to check that again.
I was also thinking of making use of modification time. I could save a
modification time. If the older modification time is before that time then only
a copy is created otherwise I won't do anything. I tried to use inode.i_mtime
for this but it was the modified time even before write was called, also
applications can modify that time.
So I was thinking of storing some value in inode on disk that indicates its
backup has been created or not. Is that possible? Any other suggestions or
approaches are welcome.
You are essentially saying you want to do a Copy-On-Write virtual filesystem layer.
IMO, some of these have been done, and it would be easier to implement these in userland (using libfuse and the fuse module, e.g.). That way, you can be king of your castle and add your metadata in any which way you feel is appriate:
just add (hidden) metadata files to each directory
use extended POSIX attributes (setfattr and friends)
heck, you could even use a sqlite database
If you really insist on doing these things in-kernel, you'll have a lot more work since accessing the metadata from kernel mode is goind to take a lot more effort (you'd most likely want to emulate your own database using memory mapped files so as to minimize the amount of 'userland (style)' work required and to make it relatively easy to get atomicity and reliability right1.
1
On How Everybody Gets File IO Wrong: see also here
You can use atime instead of mtime. In that case setting S_NOATIME flag on the inode prevents it from updating (see touch_atime() function at the inode.c). The only thing you'll need is to mount your filesystem with noatime option.

Resources