Program to see the bytes from a file internally - text

Do you know if exist one program or method to see (secuences of)bytes from a text,html file?
Not to see characters, rather see the complete sequence of bytes.
recommendations?

yes, it is called hex editor... Hundreds of those exist out there.
Here are some: http://en.wikipedia.org/wiki/Comparison_of_hex_editors

A common hex editor allows you to view any file's byte sequence.

If you just want to see the existing bytes (without changing them) you can use a hex-dump program, which is much smaller and simpler than a hex editor. For example, here's one I wrote several years ago:
/* public domain by Jerry Coffin
*/
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
unsigned long offset = 0;
FILE *input;
int bytes, i, j;
unsigned char buffer[16];
char outbuffer[60];
if ( argc < 2 ) {
fprintf(stderr, "\nUsage: dump filename [filename...]");
return EXIT_FAILURE;
}
for (j=1;j<argc; ++j) {
if ( NULL ==(input=fopen(argv[j], "rb")))
continue;
printf("\n%s:\n", argv[j]);
while (0 < (bytes=fread(buffer, 1, 16, input))) {
sprintf(outbuffer, "%8.8lx: ", offset+=16);
for (i=0;i<bytes;i++) {
sprintf(outbuffer+10+3*i, "%2.2X ",buffer[i]);
if (!isprint(buffer[i]))
buffer[i] = '.';
}
printf("%-60s %*.*s\n", outbuffer, bytes, bytes, buffer);
}
fclose(input);
}
return 0;
}

Related

Printing Lines from Intel HEX Record File

I'm trying to send the contents of an Intel Hex file over a Serial connection to a microcontroller, which will process each line sent and program them into memory as needed. The processing code expects the lines to be sent as they appear in the Hex file, including the newline characters at the end of each line.
This code is being run in Visual Studio 2013 on a Windows 10 PC; for reference, the microcontroller is an ARM Cortex-M0+ model.
However, the following code doesn't seem to be processing the Intel Hex record file the way that I expected.
...
int count = 0;
char hexchar;
unsigned char Buffer[69]; // 69 is max ascii hex read length for microcontroller
ifstream hexfile("pdu.hex");
while (hexfile.get(hexchar))
{
Buffer[count] = hexchar;
count++;
if (hexchar == '\n')
{
for (int i = 0; i < count; i++)
{
printf("%c", Buffer[i]);
}
serial_tx_function(Buffer); // microcontroller requires unsigned char
count = 0;
}
}
...
Currently, the serial transmission call is commented out, and the for loop is there to verify that the file is being read properly. I expect to see each line of the hex file printed out to the terminal. Instead, I get nothing at all. Any ideas?
EDIT: After further investigation, I determined that the program isn't even entering the while loop because the file fails to open. I don't know why that would be the case, since the file exists and can be opened in other programs like Notepad. However, I'm not terribly experienced with file I/O, so I might be overlooking something.
*.hex files contain non-ascii data a lot of the times that can have issues being printed out on command-line terminals.
I would just say you should try to open the file as a binary and print the characters as hexadecimal numbers.
So make sure you open the file in binary mode with ifstream hexfile("pdu.hex", ifstream::binary); and if you want to print hex characters the printf specifier is %x or %hhx for char.
The whole program would look something like this:
#include <iostream>
#include <fstream>
#include <cassert>
int main()
{
using namespace std;
int count = 0;
char hexchar;
constexpr int MAX_LINE_LENGTH = 69;
unsigned char Buffer[MAX_LINE_LENGTH]; // 69 is max ascii hex read length for microcontroller
ifstream hexfile("pdu.hex",ios::binary);
while (hexfile.get(hexchar))
{
assert(count < MAX_LINE_LENGTH);
Buffer[count] = hexchar;
count++;
if (hexchar == '\n')
{
for (int i = 0; i < count; i++)
{
printf("%hhx ", Buffer[i]);
}
printf("\n");
//serial_tx_function(Buffer); // microcontroller requires unsigned char
count = 0;
}
}
}

Non collective write using in file view

When trying to write blocks to a file, with my blocks being unevenly distributed across my processes, one can use MPI_File_write_at with the good offset. As this function is not a collective operation, this works well.
Exemple :
#include <cstdio>
#include <cstdlib>
#include <string>
#include <mpi.h>
int main(int argc, char* argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int global = 7; // prime helps have unbalanced procs
int local = (global/size) + (global%size>rank?1:0);
int strsize = 5;
MPI_File fh;
MPI_File_open(MPI_COMM_WORLD, "output.txt", MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fh);
for (int i=0; i<local; ++i)
{
size_t idx = i * size + rank;
std::string buffer = std::string(strsize, 'a' + idx);
size_t offset = buffer.size() * idx;
MPI_File_write_at(fh, offset, buffer.c_str(), buffer.size(), MPI_CHAR, MPI_STATUS_IGNORE);
}
MPI_File_close(&fh);
MPI_Finalize();
return 0;
}
However for more complexe write, particularly when writting multi dimensional data like raw images, one may want to create a view at the file with MPI_Type_create_subarray. However, when using this methods with simple MPI_File_write (which is suppose to be non collective) I run in deadlocks. Exemple :
#include <cstdio>
#include <cstdlib>
#include <string>
#include <mpi.h>
int main(int argc, char* argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int global = 7; // prime helps have unbalanced procs
int local = (global/size) + (global%size>rank?1:0);
int strsize = 5;
MPI_File fh;
MPI_File_open(MPI_COMM_WORLD, "output.txt", MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fh);
for (int i=0; i<local; ++i)
{
size_t idx = i * size + rank;
std::string buffer = std::string(strsize, 'a' + idx);
int dim = 2;
int gsizes[2] = { buffer.size(), global };
int lsizes[2] = { buffer.size(), 1 };
int offset[2] = { 0, idx };
MPI_Datatype filetype;
MPI_Type_create_subarray(dim, gsizes, lsizes, offset, MPI_ORDER_C, MPI_CHAR, &filetype);
MPI_Type_commit(&filetype);
MPI_File_set_view(fh, 0, MPI_CHAR, filetype, "native", MPI_INFO_NULL);
MPI_File_write(fh, buffer.c_str(), buffer.size(), MPI_CHAR, MPI_STATUS_IGNORE);
}
MPI_File_close(&fh);
MPI_Finalize();
return 0;
}
How to avoid such a code to lock ? Keep in mind that by real code will really use the multidimensional capabilities of MPI_Type_create_subarray and cannot just use MPI_File_write_at
Also, it is difficult for me to know the maximum number of block in a process, so I'd like to avoid doing a reduce_all and then loop on the max number of block with empty writes when localnb <= id < maxnb
You don't use MPI_REDUCE when you have a variable number of blocks per node. You use MPI_SCAN or MPI_EXSCAN: MPI IO Writing a file when offset is not known
MPI_File_set_view is collective, so if 'local' is different on each processor, you'll find yourself calling a collective routine from less than all processors in the communicator. If you really really need to do so, open the file with MPI_COMM_SELF.
the MPI_SCAN approach means each process can set the file view as needed, and then blammo you can call the collective MPI_File_write_at_all (even if some processes have zero work -- they still need to participate) and take advantage of whatever clever optimizations your MPI-IO implementation provides.

Get certain parts out of a string using C

Evening everyone hope on of you gurus can help. I am trying to find the answer to this issue I need to read the data out of the string below by searching the tags. i.e IZTAG UKPART etc however the code I am using is no good as it only stores the 1st part of it for example UKPART = 12999 and misses out the -0112. Is there a better way to search strings ?
UPDATE SO FAR.
#include <stdio.h>
#include <string.h>
#include <windows.h>
int main ()
{
// in my application this comes from the handle and readfile
char buffer[255]="TEST999.UKPART=12999-0112...ISUE-125" ;
//
int i;
int codes[256];
char *pos = buffer;
size_t current = 0;
//
char buffer2[255];
if ((pos=strstr(pos, "UKPART")) != NULL) {
strcpy (buffer2, pos); // buffer2 <= "UKPART=12999-0112...ISUE-125"
}
printf("%s\n", buffer2);
system("pause");
return 0;
}
NOW WORKS BUT RETURN WHOLE STRING AS OUTPUT I NEED TO JUST RETURN UKPART FOR EXAMPLE THANKS SO FAR :-)
strstr() is absolutely the right way to search for the substring. Cool :)
It sounds like you want something different from "sscanf()" to copy the substring.
Q: Why not just use "strcpy ()" instead?
EXAMPLE:
char buffer[255]="IZTAG-12345...UKPART=12999-0112...ISUE-125" ;
char buffer2[255];
if ((pos=strstr(pos, "UKPART")) != NULL) {
strcpy (buffer2, pos); // buffer2 <= "UKPART=12999-0112...ISUE-125"

why iconv read more bytes than i specified

I use
size_t iconv(iconv_t cd, char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft);
to convert UTF-16BE to GB2312.
inbytesleft is bytes number to be convert. After conversion, inbytesleft is bytes number of not converted.
After one call, I found inbytesleft is -2, according to iconv man page this function should read at most inbytesleft.
Who can tell my why and how to fix this?
code to be convert is
"保单验证"
Thanks
How are you getting the input data into your program?
I've tested the situation using this code and it seems to work:
#include <stdio.h>
#include <iconv.h>
#include <errno.h>
int main(){
char data[10] = {0x4f,0xdd,0x53,0x55,0x9a,0x8c,0x8b,0xc1, 0, 0};
char outdata[20];
char *dataptr;
char *outdataptr;
iconv_t cd;
size_t result;
size_t inbytesleft = 8;
size_t outbytesleft = 20;
int i;
cd = iconv_open("GB2312", "UTF-16BE");
dataptr = data;
outdataptr = outdata;
result = iconv(cd, &dataptr, &inbytesleft, &outdataptr, &outbytesleft);
if(result == -1)
printf("Error: %d\n", errno);
printf(" result: %zd\n", result);
printf(" inbytesleft: %zd\n", inbytesleft);
printf("outbytesleft: %zd\n", outbytesleft);
for(i = 20; i > outbytesleft; i--){
if(i != 20)
printf(",");
printf("0x%02x", *((unsigned char *)&(outdata[20-i])));
}
printf("\n");
return 0;
}
It prints
result: 0
inbytesleft: 0
outbytesleft: 12
0xb1,0xa3,0xb5,0xa5,0xd1,0xe9,0xd6,0xa4
Which appears to be correct.
The array of items in the variable data is the UTF-16BE encoding of 保单验证
If this doesn't help, could you post your code for analysis?

How to get the size of a gunzipped file in vim

When viewing (or editing) a .gz file, vim knows to locate gunzip and display the file properly.
In such cases, getfsize(expand("%")) would be the size of the gzipped file.
Is there a way to get the size of the expanded file?
[EDIT]
Another way to solve this might be getting the size of current buffer, but there seems to be no such function in vim. Am I missing something?
There's no easy way to get the uncompressed size of a gzipped file, short of uncompressing it and using the getfsize() function. That might not be what you want. I took at a look at RFC 1952 - GZIP File Format Specification, and the only thing that might be useful is the ISIZE field, which contains "...the size of the original (uncompressed) input data modulo 2^32".
EDIT:
I don't know if this helps, but here's some proof-of-concept C code I threw together that retrieves the value of the ISIZE field in a gzip'd file. It works for me using Linux and gcc, but your mileage may vary. If you compile the code, and then pass in a gzip'd filename as a parameter, it will tell you the uncompressed size of the original file.
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
int main(int argc, char *argv[])
{
FILE *fp = NULL;
int i=0;
if ( argc != 2 ) {
fprintf(stderr, "Must specify file to process.\n" );
return -1;
}
// Open the file for reading
if (( fp = fopen( argv[1], "r" )) == NULL ) {
fprintf( stderr, "Unable to open %s for reading: %s\n", argv[1], strerror(errno));
return -1;
}
// Look at the first two bytes and make sure it's a gzip file
int c1 = fgetc(fp);
int c2 = fgetc(fp);
if ( c1 != 0x1f || c2 != 0x8b ) {
fprintf( stderr, "File is not a gzipped file.\n" );
return -1;
}
// Seek to four bytes from the end of the file
fseek(fp, -4L, SEEK_END);
// Array containing the last four bytes
unsigned char read[4];
for (i=0; i<4; ++i ) {
int charRead = 0;
if ((charRead = fgetc(fp)) == EOF ) {
// This shouldn't happen
fprintf( stderr, "Read end-of-file" );
exit(1);
}
else
read[i] = (unsigned char)charRead;
}
// Copy the last four bytes into an int. This could also be done
// using a union.
int intval = 0;
memcpy( &intval, &read, 4 );
printf( "The uncompressed filesize was %d bytes (0x%02x hex)\n", intval, intval );
fclose(fp);
return 0;
}
This appears to work for getting the byte count of a buffer
(line2byte(line("$")+1)-1)
If you're on Unix/linux, try
:%!wc -c
That's in bytes. (It works on windows, if you have e.g. cygwin installed.) Then hit u to get your content back.
HTH
From within vim editor, try this:
<Esc>:!wc -c my_zip_file.gz
That will display you the number of bytes the file is having.

Resources