Reading from tape - io

I want to read data from a tape and store that data on disk as a virtual tape. How to I maintain the original block structure of the tape? Some of the data I have requires the block structure stays the same. How to I establish what the block structure is on the source tape? I was thinking of writing the blocks to file with a header and footer structure and then using that to write back to tape/virtual tape maintaining the block structure. I can't work out how to establish the block structure of the data of the incoming data. I am doing this on Linux(Centos) in C. Language is not critical, will accept help in any language.

As far as I know, your analysis is correct. Tape will not maintain any structure of files which it has. You should use "file marks" to find out the location of a file in a tape.
Actually process of writing a set of files to a tape goes something like this : Write 1st file, Write 1st filemark, Write 2nd file, Write 2nd filemark & so on. While restoring, for example, you need 2nd file to be restore, just jump to the 1st filemark on the tape & start reading it using ReadFile, till you reach the next file mark.
Here are some APIs you can use to do these operations said above :
Writing a file to tape : BackupRead & WriteFile
Write a file mark : WriteTapeMark
Restoring a file from tape : ReadFile & BackupWrite
Jumping to a filemark : SetTapePosition
In case any doubts arise, please get back to me.

Related

Returning absolute position when file is appended in Node.js

I am using fs.writeSync(fd, buffer[, options]) to append data to a file via multiple processes, the file handle is instantiated with correct flag for opening the file in append mode.
Question is, the above api returns number of bytes written to file which is good, what i need to understand is at what address(position) they were written to in the file, either start or end will suffice Eg: 10 bytes were written at start address of 123478899 or end address of (123478899+10)
Since it is been written by multiple processes and across different computers (file is actually stored on NAS) there are workarounds but all are complicated and needs synchronisation.
So is something like this is possible?

How to make sure that file contents were not modified during read operation in NodeJS?

Suppose we have a file (less than 1MiB). This file can be modified (or deleted) by some program (git) from time to time. Also this file is being read by a Node application from time to time.
How to make sure that file contents that have been read by the Node application are in a consistent state?
Here's an example of what I'm afraid might happen.
Initial state of the file (A, B, C are just different parts of the file): ABC.
Node app reads file stats (mtimeMs).
Node app starts reading the file and reads A.
Other process starts writing to the file and writes to A making it A* and to B making it B* (file state: A*B*C).
Node app reads B* and C.
Node app reads file stats (mtimeMs).
Other process writes to C making it C* (file state: A*B*C*).
OS writes new file modification time.
As a result the Node application will get inconsistent file state AB*C while both mtimeMs read before and after operation are the same.
Note: temporarily blocking write operations to the file is not an option.
You could take a hash from the string you read and then after the file, compare those, if they are the same, you know the data is the same, if not you retry till it does.
I don't know if a hash from a string and the hash from the file would be the same tho... Then you could take a hash from the file before and after you read the data, if they are different, have your code retry until they aren't.
example: https://gist.github.com/GuillermoPena/9233069
You could start watching the file you are reading, taking appropriate action when it does.
https://nodejs.org/docs/latest/api/fs.html#fs_fs_watchfile_filename_options_listener

Linux file dirty page write back order

In Linux, for a single file what is the dirty page write back(to disk) order ? is it from beginning to end ? or out of order ?
Scenario 1 : without overwrite
A file(in disk) is created and large amount of data written (sequentially) quickly. Now I presume these would be in multiple page caches. When writing back the dirty pages is pages written back in order ?
e.g. Say server shutdown before file write completion.
Now after reboot can we have the disk file in below state
|--correct data --|---data unset/garbage--|--correct data--|
i.e. I understand last set of bytes in file can be incomplete, but can data in mid be incomplete
Scenario 2 : with overwrite (attempt to use file similar to a circular/ring buffer)
File created, data written, after reaching a max size, "fsync"
called (i.e. data+meta data synchronized).
Now, file pointer is
moved to the beginning of the file and data written sequentially.
(No fsync done)
Now due to a server shutdown can we have the disk file in below state after reboot
|--Newly written data--|--Old data--|--New data--|...
i.e. for new data, some pages were written to disk out of order
OR
can I assume it is always
|--Newly written data--|----Newly written data--|--Old data--|
i.e. old data and new data will not mix-up (if present old data would only be at the end of file)

ZIP file format. How to read file properly?

I'm currently working on one Node.js project. I want to have an ability to read, modify and write ZIP file without saving it into FS (we receive it by TCP and send it back after modifications were made), and so far it looks like possible bocause of simple ZIP file structure. Currently I refer to this documentation.
So ZIP file has simple structure:
File header 1
File data 1
File data descriptor 1
File header 2
File data 2
File data descriptor 2
...
[other not important yet]
First we need to read file header, which contains field compressed size, and it could be the perfect way to read file data 1 by it's length. But it's actually not. This field may contain '0' or '0xFFFFFFFF', and those values don't describe its actual length. In that case we have to read file data without information about it's length. But how?..
Compression/Decopression algorithm descriptions looks pretty complex to me, and I plan to use ZLIB for compression itself anyway. So if something useful described there, then I missed the point.
Can someone explain the proper way to read those files?
P.S. Please avoid suggesting npm modules. I do not want to only solve the problem, but also to understand how things work.
Note - I'm assuming you want to read and process the zip file as
it comes off the socket, rather than reading the complete zip file into
memory before processing. Both options are valid.
I'd initially ignore the use cases where the compressed size has a value of '0' or '0xFFFFFFFF'. The former is only present in zip files created in streaming mode, the latter for zip files larger than 4Gig.
Dealing with them adds a lot of complexity - you can add support for them later, if necessary. Whether you ever need to support the 0/0xFFFFFFFF use cases depends on the nature of the zip files you intend to process.
When the compression method is deflated (8), use zlib for compression/decompression. You also need to support compression method stored (0). It gets used for very small files where compression isn't appropriate.

Changing the head of a large Fortran binary file without dealing with the whole body

I have a large binary file (~ GB size) generated from a Fortran 90 program. I want to modify something in the head part of the file. The structure of the file is very complicated and contains many different variables, which I want to avoid going into. After reading and re-writing the head, is it possible to "copy and paste" the reminder of the file without knowing its detailed structure? Or even better, can I avoid re-writing the whole file altogether and just make changes on the original file? (Not sure if it matters, but the length of the header will be changed.)
Since you are changing the length of the header, I think that you have to write a new, revised file. You could avoid having to "understand" the records after the header by opening the file with stream access and just reading bytes (or perhaps four byte words if the file is a multiple of four bytes) until you reach EOF and copying them to the new file. But if the file was originally created as sequential access and you want to access it that way in the future, you will have to handle the record length information for the header record(s), including altering the value(s) to be consistent with the changed the length of the record(s). This record length information is typically a four-byte integer at beginning and end of each record, but it depends on the compiler.

Resources