(dd command linux) last byte goes to next line - linux

Hi friends I need some help.
We have a tool that convert binary files to text files, and after that stores into Hadoop (HDFS).
In production, that ingestion tool uses ftp to download files from mainframe in binary format (EBCDIC), and we don't have access to donwload files from mainframe in development environment.
In order to test file conversion, we manually create text files, and we are trying to convert file using dd command (linux), using these parameters:
dd if=asciifile.txt of=ebcdicfile conf=ebcdic
After pass through our conversion tool, the expected result is:
000000000000000 DATA
000000000000000 DATA
000000000000000 DATA
000000000000000 DATA
However, it's returning the following result:
000000000000000 DAT
A000000000000000 DA
TA000000000000000 D
ATA000000000000000
I have tried with cbs, obs and ibs parameters, assigning lrec (number of lines of each line) without success.
Can anyone help me?

A few things to consider:
How exactly is the data transferred via FTP? Your "in binary format(EBCDIC)" simply doesn't make any sense at all. The FTP either transfers in binary format, then nothing gets changed, or converted during the transfer. Or the FTP transfers in text mode, aka. ASCII mode, then data is converted from a specific EBCDIC code page to a specific non-EBCDIC code page. You need to know what mode, and if text mode, what are the two code pages being used.
From the man pages for dd, it is unclear what EBCDIC, and ASCII code pages are used for the conversion. I'm just guessing here: EBCDIC code page might be CP-037, and ASCII might be CP-437. If these don't match the ones used in the FTP, the resulting test data is incorrect.
I understand you don't have access to production data in the development environment. However, you should still be able to get test data from the development mainframe using FTP from there. If not, how will you be doing end to end testing?

The EBCDIC conversion is eating your line endings:
https://www.ibm.com/docs/en/zos/2.2.0?topic=server-different-end-line-characters-in-text-files

Related

How can i Convert a text file to UCS-2 LE, from whatever the default is?

I am looking for a way to convert or save a text file in the UCS-2 LE format; specifically without BOM...i guess.
I have zero knowledge what any of that means actually; but i know i need that because of this wiki page on what i am trying to accomplish: https://developer.valvesoftware.com/wiki/Closed_Captions
in other words:
this is for a specific game engine, "Source Engine," which requires the format in order to compile in-game closed captions for sounds.
I have tried saving the file in Notepad++ using the "UCS-2 LE BOM" option under the encoding menu...there is no option for just "UCS-2 LE" however, and because of this, the captions cannot be compiled for the game engine. I need to save without BOM, "I guess" (because again I don't know what I'm talking about and I assume based on logical conclusions, that I need to not have BOM, whatever that actually means.)
I would like to know about a way to either save a txt file in that encoding format; or a way to convert one.
In my specific case; it appears that my problem boils down to "the program is weird."
what I mean by this is, notepad++ actually does save in the correct format; but I failed to realize that because of a quirk in the caption compiler where it only works if you drag the file onto it; not via command line as previously thought.
I will accept this as the answer when i am allowed to in 2 days.

Search substring in binary file

friends! Please, help me with my issue. I have an application which processes data and generates output files (different formats, but mostly images). In every generated file that application puts it's watermark - string, that looks like "03-24-5532 [some cyrillic text]".
And every time I use that application, I need to edit each file in photoshop to replace watermark string with required one and it takes a lot of time.
Is this possible to search that substring in application binary data files (using Hex Editor or something else) and replace? Which is the better way to solve this problem?

Pentaho - CSV Input not understanding special character [Windows to Linux]

I have a transformation on Pentaho Data Integration where the first thing I do is I use the "CSV Input" to map my flat file.
I've never had a problem with it on windows, but now I'm chaning my server that spoon is going to run to a linux server and now I'm having problems with special characters.
The first thing I noticed was that my tables where being updated because the system was understanding the names as diferent strings to the ones that are at my database.
Checking for the problem, I also noticed that if I go to my "CSV Input" -> Preview, it will show me the preview of my data with the problem above:
Special characters are not showing.
Where it should be:
Diretoria de Suporte à Decisão e Aplicação
I used a command to checked my file charset/codification and it showed:
$ file -bi foo.csv
text/plain; charset=iso-8859-1
If I open foo.csv on vi, it understands the special characters.
Any idea on what could be the problem or what should I try?
I don't have any data files with this encoding, so you'll have to do some experimenting, but there are some steps designed to deal with these issues.
First, the CSV Input step has a field that allows you to select the encoding of the source file. The Text File Input step has both a "Format" (meaning line terminator) and "Encoding" selector under the "Content" tab.
In Transforms, you have the Change file encoding step under the Utility tab. This step is designed to copy many files while changing their encoding; that's why it's in a transform.
In Jobs, there's the Convert file between Windows and Unix step under the File Management tab, but this appears to only deal with line terminators.
Either way it appears if the CSV/Text file input steps don't suit your needs, you'll have to copy the file to a new encoding before reading it in. It will probably be easiest to try handling it with the file input steps first.

How to extract (import) data from a mainframe dataset to excel table

I want to build a little application that calculates the critical batch of a batch flow.
As input I need to use a Mainframe dataset. If possible, being dynamic, that is, I can choose the fields that apply at the time.
I've searched the internet about that but found nothing that suited what I wanted to do.
Is there a way to do that?
I have a dataset in a mainframe library and I want to ftp that file to Excel.
Convert the file to CSV on the mainframe (for example, via a REXX exec, a z/OS UNIX shell script, or a Lua4z program),
and then insert that CSV file into Excel via FTP.
You do not need to transfer the CSV file to your PC's file system and then, as a separate step, open it in Excel.
Instead, you define the FTP (or HTTP) URL for the CSV as a data source in Excel. One advantage of this technique is that you can refresh the data from that URL
without having to reapply formatting in Excel.
There are various tutorials on the web for doing this.
In brief:
Create a new blank workbook (I'm using Excel 2010).
Select the first cell in the empty worksheet (this step is unnecessary - the cell is already selected - if you've only just created the workbook).
On the Data tab, click From Text
In the File name text box of the Import Text File dialog, enter the FTP URL of the CSV file. For example:
ftp://zos1//u/me/data.csv
(This assumes that your mainframe is configured to allow FTP using this path.)
The two consecutive slash (/) characters following the host name (zos1) indicate that the path refers to a z/OS UNIX file (/u/me/data.csv).
The CSV file must be in a z/OS UNIX path. The FTP client does not accept MVS-style (dsname) paths such as 'me.csv(data)' (even when URL-encoded; that is, with the single quotes escaped as %27); by contrast, cURL accepts such paths just fine.
The CSV file on the mainframe must be ASCII encoded, not EBCDIC. (Here, I'm using the term ASCII imprecisely: the precise character encoding you want depends on your PC's settings. You probably want Windows-1252.) This is because the FTP client sets the default transfer type to binary.
Enter your user name and password (your z/OS TSO user ID and password).
Wait for the data to load.
Format the cells. For example, set the format of any columns containing date/time values.
On the Data tab, click Connections, select the connection (that Excel created when you specified a URL for the file name), and clear the check box Prompt for file name on refresh.
To refresh the data, replacing the current data with the results of a new FTP request: on the Data tab, click Refresh All. The data is replaced; the cell formatting remains intact.
Converting an EBCDIC-encoded CSV file to ASCII
(Strictly speaking, I mean ISO-8859, not ASCII.)
Suppose you have JCL that generates a CSV file encoded in EBCDIC. You want to make that CSV file available to Excel via FTP as an ASCII-encoded z/OS UNIX (zFS) file.
Replace your existing DD statement for the output CSV file with the following DD statement:
//OUTCSV DD PATH='/u/me/data-ebcdic.csv',
// PATHOPTS=(OWRONLY,OCREAT,OTRUNC),
// PATHDISP=(KEEP,DELETE),
// PATHMODE=(SIRUSR,SIWUSR,SIRGRP),
// FILEDATA=TEXT
Replace the ddname OUTCSV with your ddname, and the zFS file path /u/me/data-ebcdic.csv with the path that you want to use.
Thanks to the FILEDATA=TEXT parameter, the resulting CSV file will have a X'15' byte at the end of each line.
Append the following step to your JCL:
//ICONV EXEC PGM=IKJEFT01
//SYSTSIN DD *
BPXBATCH sh iconv -f IBM-037 -t iso8859-1 +
/u/me/data-ebcdic.csv +
> /u/me/data-ascii.csv
/*
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
In case you're wondering why I'm calling iconv as a shell command via BPXBATCH, the following:
//ICONV EXEC PGM=EDCICONV
// PARM=('FROMCODE(IBM-037),TOCODE(iso8859-1)')
didn't quite work: it left the X'15' bytes as is, whereas running iconv as a shell command correctly converted them to X'0A'. (z/OS 2.2.)
You've got some good information in the comments, consensus appears to be conversion to CSV (or TSV to avoid commas embedded in your data) is the easiest route. Here is a bit more information, copied from another answer...
I would strongly suggest you get the files into a text format before
transferring them to another box with a different code page. Trying to
deal with mixed text (which must have its code page translated) and
binary (which must not have its code page translated but which likely
must be converted from big endian to little endian) is harder than
doing the conversion up front.
The conversion can likely be done via the SORT utility on the
mainframe. Mainframe SORT utilities tend to have extensive data
manipulation functions. There are other mechanisms you could use
(other utilities, custom code written in the language of your choice,
purchased packages) but this is what we tend to do in these
circumstances.
Once you have your flat files converted such that all data is text,
you can transfer them via FTP or SFTP or FTPS.
...and thanks for coming back and adding more information. Hopefully the people here have provided enough information to help you solve your problem.
XML would be another possible text oriented solution. It would take more effort to create, but you could design your spreadsheet in Excel and save as an XML document, then write a program to generate the xml text using the data from your mainframe dataset. While this would be more difficult to implement than a simple CSV or TSV file, it has the advantage of implementing the spreadsheet formulas and attributes that a CSV file can not do. Another advantage, you can attach the XML document to an SMTP email note and deliver the document in "spreadsheet format" to your client.

How to determine file encoding type with Excel VBA

I have built an Excel/VBA tool to validate csv files to ensure the data they contain is valid. They csv can come originate from anywhere (from a full blown unix system or a desktop user saving data out from Excel). The Excel tool is sent out to businesses so they can validate their csv files in their own environment and without taking the risk of their data leaving thier systems. Thus, the solution needs to be in native VBA and not link into external libraries.
So using VBA, I need to be able to automatically detect UTF-8 (with or without BOM) or ANSI file encodings and warn the user if these are not the file encodings used for the csv.
I think this would perhaps involve reading in a few bytes from the start of the file and determining the encoding based on the existance of the byte order mark.
Could you help me get me started on the right track?
Assuming you have the freedom to ask user to choose the correct file type, making them responsible for what they choose as a file ;)
That means, you can create a form where users can choose the filename and the encoding type like how we do on file open wizard.
Else,
I suggest you to use the FileSystemObject. It returns a TextStream which can be utilized to determine the encoding. I doubt VBA supports other types of encoding and please correct me if it does :) and happy to hear. :)
how to detect encoding type
msdn object library model
Here is a link for further considerations:-
change encode type

Resources