Proper way to differentiate pst and dbx files in bash shell - linux

I want to identify the file-format of the input file given to my shell script - whether a .pst or a .dbx file. I checked How to check the extension of a filename in a bash script?. That one deals with txt files and two methods are given there -
check if the extension is txt
check if the mime type is application/text etc.
I tried file -ib <filename> on a .pst and a .dbx file and it showed application/octet-stream for both. However, if I just do file <filename>, then I get
this for the dbx file -
file1.dbx: Microsoft Outlook Express DBX File Message database
and this for the pst file -
file2.pst: Microsoft Outlook binary email folder (Outlook >=2003)
So, my questions are -
is it better to use mime type detection everytime when the output can be anything and we need a proper check?
How to apply mime type check in this case - both returning "application/octet-stream"?
Update
I didn't want to do an extension based detection because it seems we just can't be sure on a Unix system, that a .dbx file truly is a dbx file. Since file <filename> returns a line which contains the correct information of the file (e.g. "Microsoft Outlook Express DBX File Message database"). That means the file command is able to identify the file type properly. Then why does it not get the correct information in file -ib <filename> command?
Will parsing the string output of file <filename> be fine? Is it advisable assuming I only need to identify a narrow set of data storage files of outlook family (MS Outlook Express, MS Office Outlook 2003,2007,2010 etc.). A small text identifier like application/dbx which could be compared would be all I need.

The file command relies on having a file type detection database which includes rules for the file types that you expect to encounter. It may not be possible to recognize these file types if the file content doesn't have a unique code near the beginning of the file.
Note that the -i option to emit mime types actually uses a separate "magic" numbers file to recognize file types rather than translating long descriptions to file types. It is quite possible for these two databases to be out of sync. If your application really needs to recognize these two file types I suggest that you look at the Linux source code for "file" to see how they recognize them and then code this recognition algorithm right into your app.
If you want to do the equivalent of DOS file type detection, then strip the extension off the filename (everything after the last period) and look up that string in your own table where you define the types that you need.

Related

caret ^ is converting to some special symbol

I'm transferring the file which has the content like below from mainframe system to a Unix instance. I've a delimiter in the file as ^&*. I'm sending the same in mainframe but when we receive the file in the unix we're receiving as Ø&*.
I'm using connect direct to transfer the file from one system to another.
File Type: Flat File, File transfer: CD (Connect Direct)
file content
H^&*20220407^&*160009^&*2006
T^&*1
But when I receive the file in the unix server I can the file content is changed. Mainly ^ is converted to Ø.
HØ&*20220407Ø&*160009Ø&*2006
TØ&*1
This is most surely a code page problem.
The data in the file on the mainframe is (most probably) in some EBCDIC code page. ConnectDirect is doing a code page tranformation when sending the file to that UNIX system. This is what the XLATE(YES) means.
However, there is some default code page "from"-"to" pair configured, which is being used with XLATE(YES). But this probably is not the correct pair. You need to
find out what EBCDIC code page the data on the mainframe is encoded in. Is it IBM-037, IBM-1047, IBM-500, IBM-273, etc. There are many.
find out what code page the data shall be in on the UNIX side: UTF-8, ISO8859-1, 437, etc. There are many.
make sure ConnectDirect will transform using the correct source and target code pages.
Ask your ConnectDirect support people to help you with this.

How to reference the most current Physical Sequential (PS) file in JCL

I wanted to create a job where I need to consider the latest file available as input file.
File format is as below: FILE1.TEST.TYYMMDD
is there any way to identify latest file based on date present in file name via JCL.
P.S. GDG versions are not created in existing process . Only PS file is created.
Thank you
I wanted to create a job where I need to consider the latest file available as input file. File [name] format is as below: FILE1.TEST.TYYMMDD is there any way to identify latest file based on date present in file name via JCL.
No.
You indicate that GDGs are not created in the existing process. GDGs would be the best way to accomplish your goal. Absent GDGs, you must write code.
You could accomplish your goal by writing (C, clist, COBOL, PL/I, Rexx) code using the LMDINIT and LMDLIST ISPF services. Then you would execute your code by running ISPF in batch. Many mainframe shops have a cataloged procedure to execute ISPF in batch.
Agree with #cschneid that there is not a platform way to handle this. However, I want to point out that GDGs are the platform way of managing PS files for access in a relative form.
Your comment
GDG versions are not created in existing process . Only PS file is
created.
That statement didn't make sense to me. GDGs are not a file type like physical sequential (PS) or partitioned (PO). It's a convention to allow relative reference to files created over time which sounds like what you want. I've only seen the use of GDGs for PS files.
Putting the date in the file name can have its uses but to z/OS its only part of the filename and not meta information that it operates on (like G0000v00's in GDGs.

How to zip list of files + append custom string to each file on Linux on the fly

I need a Linux solution as I've figured out a way to do this on Windows by modifying a C# implementation but I am not sure where to start on Linux. I would like to be able to do the following from the command line:
Run a command providing a list of files to be zipped, an output path, and a custom string
The custom string should be automatically appended to the end of the internal data of each file but not written to the original file. I want it all handled in-stream / in memory.
The data stream is fed to the zip utility and zip file is created at the output location with 0 compression (store only)
Explanation: this custom string is used as a watermark to uniquely identify the files in the zip.

Thermocycle library and OpenModelica

I want to load Thermocycle library by OpenModelica connect edition. But I get a message "The file was not encoded in UTF-8"
To fix this problem I should: "add a file package.encoding at the top-level." But I don't understand what must I do? What is the file which called "package.encoding", what should this file consist from? Where should I insert it?
The error message says it all. "add a file package.encoding at the top-level."
Put the file where your library's package.mo is located.
The file must contain the name of encoding used by the library.
Note that you can also use OMEdit's encoding conversion feature. File->Open/Convert Modelica File(s) With Encoding

MapDB file types

I have a problem with mapDB version 1.0.6. When i create a database i end up with two files with the same name but with different file types.
One is for example IRTree with file type FILE and the other is IRTree with file type .p
Having said that, whenever i try to read my database providing a filename IRTree i end up with an exception:
NullPointerException with the command DBMaker.newFileDB(new File(filename)).readOnly().make(); or an IOException: storage header is invalid.
Can anyone explain to me what's going on?
MapDB uses two files. .P file is used to store data. Always open file without extension, otherwise it will try to open incorrect file.

Resources