Unexpected End-of-file (1AH) in .dbf file - python-3.x

I am using the excellent dbf package for Python to read data from a .dbf file. The file is produced by a proprietary Windows application whose source code I cannot access. dbf says the .dbf is a Foxpro file.
The .dbf file is continually updated, so I re-read it regularly. It contains over a million records. Everything was fine until today, when I suddenly received the following error:
DbfError: record data not correct -- first character should be a ' ' or a '*'.
Closer inspection reveals that the data for the corresponding record (#46448) now starts with ASCII character 26 (0x1A). Wikipedia says that this character is used in .dbf files as an end-of-file marker. Why does this character appear in the middle of the file all of a sudden?
There is also a forum post by someone who seems to have had the same problem. Unfortunately, no resolution is given there.

The problem seems to be with the creating application. Further digging showed there was only one record with 0x1A in that field, and that application was treating the record normally.

Related

YAML file one line filled with null characters, #0000 character not supported while reading

I've built a python based application(which runs 24/7) that logs some information in a YAML file every few minutes. It was working perfectly for a few days. Suddenly after approximately after 2 weeks, one line in the YAML file was filled with NUL characters (416 characters of NUL to be precise).
Now the suspicion is that someone might've tried to open the already running application again, so both the applications tried to write/access the same YAML file which could've caused this. But I couldn't replicate this.
Just wanted to know the cause of this issue.
Please let me know if someone faced the same issue before.
Some context about the file writing:
The YAML file will be loaded in append mode and a list is written inside it using the command below:
with open(file_path, 'a') as file:
yaml.dump(summary_list, file)
Concurrent access is a possible cause for this especially when you're appending. For example, it may be that both instances opened the file and set the start marker on the same position, but let the file grow to the sum of both appended data dumps. That cause some part of the file not to be written, which might explain the NULs.
Whatever happened is more dependent on your OS and your filesystem than it is on YAML. But even if we knew that we couldn't tell for sure.
I recommend using a proper logging framework to avoid such issues; you can dump YAML as string to log it.

(dd command linux) last byte goes to next line

Hi friends I need some help.
We have a tool that convert binary files to text files, and after that stores into Hadoop (HDFS).
In production, that ingestion tool uses ftp to download files from mainframe in binary format (EBCDIC), and we don't have access to donwload files from mainframe in development environment.
In order to test file conversion, we manually create text files, and we are trying to convert file using dd command (linux), using these parameters:
dd if=asciifile.txt of=ebcdicfile conf=ebcdic
After pass through our conversion tool, the expected result is:
000000000000000 DATA
000000000000000 DATA
000000000000000 DATA
000000000000000 DATA
However, it's returning the following result:
000000000000000 DAT
A000000000000000 DA
TA000000000000000 D
ATA000000000000000
I have tried with cbs, obs and ibs parameters, assigning lrec (number of lines of each line) without success.
Can anyone help me?
A few things to consider:
How exactly is the data transferred via FTP? Your "in binary format(EBCDIC)" simply doesn't make any sense at all. The FTP either transfers in binary format, then nothing gets changed, or converted during the transfer. Or the FTP transfers in text mode, aka. ASCII mode, then data is converted from a specific EBCDIC code page to a specific non-EBCDIC code page. You need to know what mode, and if text mode, what are the two code pages being used.
From the man pages for dd, it is unclear what EBCDIC, and ASCII code pages are used for the conversion. I'm just guessing here: EBCDIC code page might be CP-037, and ASCII might be CP-437. If these don't match the ones used in the FTP, the resulting test data is incorrect.
I understand you don't have access to production data in the development environment. However, you should still be able to get test data from the development mainframe using FTP from there. If not, how will you be doing end to end testing?
The EBCDIC conversion is eating your line endings:
https://www.ibm.com/docs/en/zos/2.2.0?topic=server-different-end-line-characters-in-text-files

How to create multiline QR code using bartender?

I am trying to generate multiline QR code in Bartender. I am using excel file as data source and take 3 filed for testing first. It successfully generate QR code but when I scan all text shows in a single line but I want it to be in 3 fiend in 3 separate line. I have used Carriage Return control character <<CR>> after 1 data field. Below is QR code properties settings.
When I scan the QR Code image then it gives me following output.
No_LAN_IP90:61:AE:BC:5B:01FAC-Laptop-044
My Expected output is
No_LAN_IP
90:61:AE:BC:5B:01
FAC-Laptop-044
Any help is greatly appreciated.
I have Tagged the post as Excel because I am using excel file as data source. May be someone excel expert may know the fact.
Using LF (Line Feed) instead of CR should solve the problem.
--edit--
After rereading your problem I saw I missed something. You are using the example data field to add a LF which will not be used while printing. In the properties screen you have a tab "Transforms" which has an option to add a suffix and/or prefix. If you put the LF in the suffix field for your first and second line your problem should be solved.

Pentaho - CSV Input not understanding special character [Windows to Linux]

I have a transformation on Pentaho Data Integration where the first thing I do is I use the "CSV Input" to map my flat file.
I've never had a problem with it on windows, but now I'm chaning my server that spoon is going to run to a linux server and now I'm having problems with special characters.
The first thing I noticed was that my tables where being updated because the system was understanding the names as diferent strings to the ones that are at my database.
Checking for the problem, I also noticed that if I go to my "CSV Input" -> Preview, it will show me the preview of my data with the problem above:
Special characters are not showing.
Where it should be:
Diretoria de Suporte à Decisão e Aplicação
I used a command to checked my file charset/codification and it showed:
$ file -bi foo.csv
text/plain; charset=iso-8859-1
If I open foo.csv on vi, it understands the special characters.
Any idea on what could be the problem or what should I try?
I don't have any data files with this encoding, so you'll have to do some experimenting, but there are some steps designed to deal with these issues.
First, the CSV Input step has a field that allows you to select the encoding of the source file. The Text File Input step has both a "Format" (meaning line terminator) and "Encoding" selector under the "Content" tab.
In Transforms, you have the Change file encoding step under the Utility tab. This step is designed to copy many files while changing their encoding; that's why it's in a transform.
In Jobs, there's the Convert file between Windows and Unix step under the File Management tab, but this appears to only deal with line terminators.
Either way it appears if the CSV/Text file input steps don't suit your needs, you'll have to copy the file to a new encoding before reading it in. It will probably be easiest to try handling it with the file input steps first.

What is the encoding format of dns file generated by DNSCMD command

I have some Unicode Host A record names (like abcáxyz) in my DNS zone. When I use dnscmd /zoneexport it creates a zone file. When I open this file in notepad or any text editor, it will show record name as abc\303\241xyz. I want to read this file through program, I would like to know, which type of encoding dnscmd uses while writing to a file, to represent the such ( á ) characters as \303\241.
I tried to use following encode formats
iso-8859-1,
ISO-8859-15,
ISO-8859-9,
windows-1252,
windows-1254.
All are working for me, but I am not sure which one to use.
I have posted this question on microsoft forum also and from there i got the answer
http://social.technet.microsoft.com/Forums/en-US/winserverNIS/thread/35f954c9-2372-4175-9ef0-4fa4839fe408

Resources