Reading binary data through stdin with cat command on linux

Reading binary data through stdin with cat command on linux - linux

I am trying to read binary data through stdin (0) to my program using cat command. My program's task is to change binary to integer or double and write it to the desired file descriptor.
When I run the command: cat data_int.bin | ./myprogram -d, I can not read anything and also the size of input is 0. But when i try: ./myprogram -d -I 0 0<data_int.bin, my program can read bytes and succesfully finish.
My code:
#libraries
int main(int argc, char* argv[]) {
int c;
extern char *optarg;
extern int optind;
extern int optopt;
char input_file[100] = { 0 };
int nastavljen_input = 0;
char output_file[100] = { 0 };
int nastavljen_output = 0;
int tip = -1; // 0 - char, 1- int, 2 - double
int fd_in = 0;
int fd_out = 1;
while((c = getopt(argc,argv, ":o:i:O:I:cdf")) != -1) {
switch(c) {
case 'o':
strcpy(output_file,optarg);
nastavljen_output = 1;
fd_out = open(output_file,O_WRONLY);
break;
case 'i':
strcpy(input_file,optarg);
nastavljen_input = 1;
fd_in = open(input_file,O_RDONLY);
break;
case 'O':
fd_out = atoi(optarg);
break;
case 'I':
fd_in = atoi(optarg);
break;
case 'c':
tip = 0;
break;
case 'd':
tip = 1;
break;
case 'f':
tip = 2;
break;
}
}
if(tip > -1) {
struct stat st;
fstat(fd_in, &st); //fd_in would be 0 with cat command
int size = st.st_size; // number of bytes in input file
printf("%d\n",size); // this will print out 0 with cat command
unsigned char buffer[size];
read(fd_in,buffer,size);
...code continues...
Flag -d is for reading bytes representing integers and -I is for choosing input file descriptor. Output is stdout(1) in this case.
My question is, it there a problem with my code or is this just the way cat command works? I am using Xubuntu.
Thank you for your time and effort,
Domen

Pipes always have st_size 0, because the length of the stream of bytes that will be written into a pipe is not known ahead of time.
There are many programs that behave differently on cat foo | prog and prog < foo. This is the reason. In the second case, prog has a regular file on stdin so stat reveals the size. Also in the second case, lseek/fseek will work, and on the pipe it won't.
If you want to read the contents of a stdin into a buffer, and you need that to work when stdin is a pipe, you have to guess a size for it, and then keep track of how much you read and when you run out of memory, allocate some more. realloc is good for this.

Related

Processing backspace control character (^H) in real time while logging sdout to file

I am working on a script to test new-to-me hard drives in the background (so I can close the terminal window) and log the outputs. My problem is in getting badblocks to print stdout to the log file so I can monitor its multi-day progress and create properly formatted update emails.
I have been able to print stdout to a log file with the following: (flags are r/w, % monitor, verbose)
sudo badblocks -b 4096 -wsv /dev/sdx 2>&1 | tee sdx.log
Normally the output would look like:
Testing with pattern 0xaa: 2.23% done, 7:00 elapsed. (0/0/0 errors)
No new-line character is used, the ^H control command backs up the cursor, and then the new updated status overwrites the previous status.
Unfortunately, the control character is not processed but saved as a character in the file, producing the above output followed by 43 copies of ^H, the new updated stats, 43 copies of ^H, etc.
Since the output is updated at least once per second, this produces a much larger file than necessary, and makes it difficult to retrieve the current status.
While working in terminal, the solution cat sdx.log && echo"" prints the expected/wanted results by parsing the control characters (and then inserting a carriage return so it is not immediately printed over by the next terminal line), but using cat sdx.log > some.file or cat sdx.log | mail both still include all of the extra characters (though in email they are interpreted as spaces). This solution (or ones like it which decode or remove the control character at the time of access still produce a huge, unnecessary output file.
I have worked my way through the following similar questions, but none have produced (at least that I can figure out) a solution which works in real time with the output to update the file, instead requiring that the saved log file be processed separately after the task has finished writing, or that the log file not be written until the process is done, both of which defeat the stated goal of monitoring progress.
Bash - process backspace control character when redirecting output to file
How to "apply" backspace characters within a text file (ideally in vim)
Thank you!

The main place I've run into this in real life is trying to process man pages. In the past, I've always used a simple script that post processes by stripping out the backspace appropriately. One could probably do this sort of thing in 80 character of perl, but here's an approach that handles backspace and cr/nl fairly well. I've not tested extensively, but it produces good output for simple cases. eg:
$ printf 'xxx\rabclx\bo\rhel\nworld\n' | ./a.out output
hello
world
$ cat output
hello
world
$ xxd output
00000000: 6865 6c6c 6f0a 776f 726c 640a hello.world.
If your output starts to have a lot of csi sequences, this approach just isn't worth the trouble. cat will produce nice human consumable output for those cases.
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
FILE * xfopen(const char *path, const char *mode);
off_t xftello(FILE *stream, const char *name);
void xfseeko(FILE *stream, off_t offset, int whence, const char *name);
int
main(int argc, char **argv)
{
const char *mode = "w";
char *name = strchr(argv[0], '/');
off_t last = 0, max = 0, curr = 0;
name = name ? name + 1 : argv[0];
if( argc > 1 && ! strcmp(argv[1], "-a")) {
argv += 1;
argc -= 1;
mode = "a";
}
if( argc > 1 && ! strcmp(argv[1], "-h")) {
printf("usage: %s [-a] [-h] file [ file ...]\n", name);
return EXIT_SUCCESS;
}
if( argc < 2 ) {
fprintf(stderr, "Missing output file. -h for usage\n");
return EXIT_FAILURE;
}
assert( argc > 1 );
argc -= 1;
argv += 1;
FILE *ofp[argc];
for( int i = 0; i < argc; i++ ) {
ofp[i] = xfopen(argv[i], mode);
}
int c;
while( ( c = fgetc(stdin) ) != EOF ) {
fputc(c, stdout);
for( int i = 0; i < argc; i++ ) {
if( c == '\b' ) {
xfseeko(ofp[i], -1, SEEK_CUR, argv[i]);
} else if( isprint(c) ) {
fputc(c, ofp[i]);
} else if( c == '\n' ) {
xfseeko(ofp[i], max, SEEK_SET, argv[i]);
fputc(c, ofp[i]);
last = curr + 1;
} else if( c == '\r' ) {
xfseeko(ofp[i], last, SEEK_SET, argv[i]);
}
}
curr = xftello(ofp[0], argv[0]);
if( curr > max ) {
max = curr;
}
}
return 0;
}
off_t
xftello(FILE *stream, const char *name)
{
off_t r = ftello(stream);
if( r == -1 ) {
perror(name);
exit(EXIT_FAILURE);
}
return r;
}
void
xfseeko(FILE *stream, off_t offset, int whence, const char *name)
{
if( fseeko(stream, offset, whence) ) {
perror(name);
exit(EXIT_FAILURE);
}
}
FILE *
xfopen(const char *path, const char *mode)
{
FILE *fp = fopen(path, mode);
if( fp == NULL ) {
perror(path);
exit(EXIT_FAILURE);
}
return fp;
}

You can delete the ^H
sudo badblocks -b 4096 -wsv /dev/sdx 2>&1 | tr -d '\b' | tee sdx.log

I have found col -b and colcrt usefull, but none worked perfect for me. These will apply control characters, not just drop them:
sudo badblocks -b 4096 -wsv /dev/sdx 2>&1 | col -b | tee sdx.log

more than one command for system call in linux

I am trying to execute a program (say target.c) that has the following
void foo(char * arg)
{
char cmd[16];
char par[16];
char * p;
strcpy(cmd, "ls --color -l ");
strcpy(par, arg);
printf("You can use \"%s %s\" to list the files in dir \"%s\"!\n",cmd, par, par);
p = (char*)malloc(strlen(cmd) + strlen(par) + 2);
strcpy(p, cmd);
strcat(p, " ");
strcat(p, par);
system(p);
}
int main(int argc, char ** argv)
{
int i;
char test[256];
if (argc > 1)
foo(argv[1]);
else
printf("usage: %s dir\n", argv[0]);
return 0;
foo(test);
};
Now i am trying to get shell by invoking it from another program (it is important to invoke from another program shown below:
int main(int argc, char **argv)
{
char * arrv[] = {NULL};
char *payload;
int i; int j;
char * argo[] = {"../targets/target1","sdknsd",NULL};
strcpy(payload,"sd;/bin/sh");
argo[1] = payload;
i=fork();
if(i == 0)
{
execve("../targets/target1" ,argo, arrv );
exit(1);
}
else if(i == -1)
{
perror("fork()");
}
}
My question is when I try to execute the target and provide command line arguments something ; /bin/sh then I get the shell but not in case of invoking from execve.
Any help would be really appreciated
Alright here is the output:
[hvalayap#localhost targets]$ ./target1 ds;/bin/sh
ls: ds: No such file or directory
sh-2.05$
The above program appends the user input string onto ls and passes it to system hence system(ls ds;/bin/sh " gives me shell
But when I try to do the same with execve from another program(the second program) it doesn't work
says "ds" directory not found

Look at your code very carefully. The char *payload is on stack, and then you strcpy at this address, hence you overwrite local variables on stack. You didn't allocate memory for this pointer (e.g. malloc or use local static buffer). If user input string will be more longer (say 255 symbols) you cat get Segmentation fault error.
BTW: Why wouldn't you use snprintf instead strcpy? More security carfully I suppose.

Faster and precise way to count lines other than wc -l

Usually I use wc -l to count the lines of a file. However for a file with 5*10^7 lines, I get only 10^7 as an answer.
I've tried everything proposed here here:
How to count lines in a document?
But it takes to much time than wc -l.
Is there any other option?

Anyone serious about speed line counting can just create their own implementation:
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#define BUFFER_SIZE (1024 * 16)
char BUFFER[BUFFER_SIZE];
int main(int argc, char** argv) {
unsigned int lines = 0;
int fd, r;
if (argc > 1) {
char* file = argv[1];
if ((fd = open(file, O_RDONLY)) == -1) {
fprintf(stderr, "Unable to open file \"%s\".\n", file);
return 1;
}
} else {
fd = fileno(stdin);
}
while ((r = read(fd, BUFFER, BUFFER_SIZE)) > 0) {
char* p = BUFFER;
while ((p = memchr(p, '\n', (BUFFER + r) - p))) {
++p;
++lines;
}
}
close(fd);
if (r == -1) {
fprintf(stderr, "Read error.\n");
return 1;
}
printf("%d\n", lines);
return 0;
}
Usage
a < input
... | a
a file
Example:
# time ./wc temp.txt
10000000
real 0m0.115s
user 0m0.102s
sys 0m0.014s
# time wc -l temp.txt
10000000 temp.txt
real 0m0.120s
user 0m0.103s
sys 0m0.016s
* Code compiled with -O3 natively on a system with AVX and SSE4.2 using GCC 4.8.2.

You could try sed
sed -n '$=' file
The = says to print the line number, and the dollar says to only do it on the last line. The -n says not to do too much else.
Or here's a way in Perl, save this as wc.pl and do chmod +x wc.pl:
#!/usr/bin/perl
use strict;
use warnings;
my $filename = <#ARGV>;
my $lines = 0;
my $buffer;
open(FILE, $filename) or die "ERROR: Can not open file: $!";
while (sysread FILE, $buffer, 65536) {
$lines += ($buffer =~ tr/\n//);
}
close FILE;
print "$lines\n";
Run it like this:
wc.pl yourfile
Basically it reads your file in in chunks of 64kB at a time and then takes advantage of the fact that tr returns the number of substitutions it has made after asking it to delete all newlines.

Try with nl and see what happens...

You can get the line count using awk as well like below
awk 'END {print NR}' names.txt
(OR) Using while .. do .. done bash loop construct like
CNT=0; while read -r LINE; do (( CNT++ )); done < names.txt; echo $CNT

Depends on how you open the file, but probably reading it from STDIN instead would get the fix:
wc -l < file

How to find matching patterns between two text files and output to another file?

I have two text files with different text organization. Both files contain few identical patterns (numbers) in the text. I'd like to find which patterns (numbers) are present in both files and write them to the output file.
file1.txt:
blablabla_25947.bkwjcnwelkcnwelckme
blablabla_111.bkwjcnwelkcnwelckme
blablabla_65155.bkwjcnwelkcnwelckme
blablabla_56412.bkwjcnwelkcnwelckme
file2.txt:
blablabla_647728.bkwjcnwelkcnwelck
kjwdhcwkejcwmekcjwhemckwejhcmwekch
blablabla_6387.bkwjcnwelkcnwelckme
wexkwhenqlciwuehnqweiugfnwekfiugew
wedhwnejchwenckhwqecmwequhcnkwjehc
owichjwmelcwqhemclekcelmkjcelkwejc
blablabla_59148.bkwjcnwelkcnwelckme
ecmwequhcnkwjehcowichjwmelcwqhemcle
kcelmkjcelkwejcwecawecwacewwAWWAXEG
blablabla_111.bkwjcnwelkcnwelckm
WESETRBRVSSCQEsfdveradassefwaefawecc
output_file.txt:
111

How about:
$ egrep -o '_[0-9]+\.' file1 | grep -of - file2 | tr -d '_.'
111
# Redirect to new file
$ egrep -o '_[0-9]+\.' file1 | grep -of - file2 | tr -d '_.' > file3
First grep gets all the digit strings (preceded by _ and preceding .) from file1 and this list is used to grep the matches in file2. The _ and . are stripped using tr.

I did in fact try to solve the "hard problem" that I thought you were posing. The following code looks for the longest string found in both file1 and file2. If there are multiple "longest" strings, it only reports the first one found. May be helpful to someone, at some point (although maybe not the solution you are looking for here):
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <sys/stat.h>
/* This routine returns the size of the file it is called with. */
static unsigned
get_file_size (const char * file_name)
{
struct stat sb;
if (stat (file_name, & sb) != 0) {
fprintf (stderr, "'stat' failed for '%s': %s.\n",
file_name, strerror (errno));
exit (EXIT_FAILURE);
}
return sb.st_size;
}
/* This routine reads the entire file into memory. */
static unsigned char *
read_whole_file (const char * file_name)
{
unsigned s;
unsigned char * contents;
FILE * f;
size_t bytes_read;
int status;
s = get_file_size (file_name);
contents = malloc (s + 1);
if (! contents) {
fprintf (stderr, "Not enough memory.\n");
exit (EXIT_FAILURE);
}
f = fopen (file_name, "r");
if (! f) {
fprintf (stderr, "Could not open '%s': %s.\n", file_name,
strerror (errno));
exit (EXIT_FAILURE);
}
bytes_read = fread (contents, sizeof (unsigned char), s, f);
if (bytes_read != s) {
fprintf (stderr, "Short read of '%s': expected %d bytes "
"but got %d: %s.\n", file_name, s, bytes_read,
strerror (errno));
exit (EXIT_FAILURE);
}
status = fclose (f);
if (status != 0) {
fprintf (stderr, "Error closing '%s': %s.\n", file_name,
strerror (errno));
exit (EXIT_FAILURE);
}
return contents;
}
int main(int argc, char* argv[]){
int i1, i2, l1, l2, lm;
unsigned char longestString[1000]; // lazy way to make big enough.
unsigned char tempString[1000];
int longestFound=0;
unsigned char *f1, *f2; // buffers with entire file contents
f1 = read_whole_file (argv[1]);
f2 = read_whole_file (argv[2]);
l1 = strlen(f1);
l2 = strlen(f2);
for(i1 = 0; i1 < l1; i1++) {
lm = 0;// length of match
for(i2 = i1; i2<l2; i2++) {
lm = 0;
while (f1[i1+lm] == f2[i2+lm] && (i1+lm<l1) && (i2+lm<l2) && lm < 1000-1) {
tempString[lm] = f1[i1+lm];
lm++;
}
if (lm > longestFound) {
tempString[lm]=0; // terminate string
strcpy(longestString, tempString);
longestFound = lm;
}
}
}
printf("longest string found is %d characters:\n", longestFound);
printf("%s\n", longestString);
free(f1);
free(f2);
return 0;
}
The code for reading entire file contents was found at http://www.lemoda.net/c/read-whole-file/index.html

Reading with cat: Stop when not receiving data

Is there any way to tell the cat command to stop reading when not receiving any data? maybe with some "timeout" that specifies for how long no data is incoming.
Any ideas?

There is a timeout(1) command. Example:
timeout 5s cat /dev/random
Dependening on your circumstances. E.g. you run bash with -e and care normally for the exit code.
timeout 5s cat /dev/random || true

cat itself, no. It reads the input stream until told it's the end of the file, blocking for input if necessary.
There's nothing to stop you writing your own cat equivalent which will use select on standard input to timeout if nothing is forthcoming fast enough, and exit under those conditions.
In fact, I once wrote a snail program (because a snail is slower than a cat) which took an extra argument of characters per second to slowly output a file (a).
So snail 10 myprog.c would output myprog.c at ten characters per second. For the life of me, I can't remember why I did this - I suspect I was just mucking about, waiting for some real work to show up.
Since you're having troubles with it, here's a version of dog.c (based on my afore-mentioned snail program) that will do what you want:
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/select.h>
static int dofile (FILE *fin) {
int ch = ~EOF, rc;
fd_set fds;
struct timeval tv;
while (ch != EOF) {
// Set up for fin file, 5 second timeout.
FD_ZERO (&fds); FD_SET (fileno (fin), &fds);
tv.tv_sec = 5; tv.tv_usec = 0;
rc = select (fileno(fin)+1, &fds, NULL, NULL, &tv);
if (rc < 0) {
fprintf (stderr, "*** Error on select (%d)\n", errno);
return 1;
}
if (rc == 0) {
fprintf (stderr, "*** Timeout on select\n");
break;
}
// Data available, so it will not block.
if ((ch = fgetc (fin)) != EOF) putchar (ch);
}
return 0;
}
int main (int argc, char *argv[]) {
int argp, rc;
FILE *fin;
if (argc == 1)
rc = dofile (stdin);
else {
argp = 1;
while (argp < argc) {
if ((fin = fopen (argv[argp], "rb")) == NULL) {
fprintf (stderr, "*** Cannot open input file [%s] (%d)\n",
argv[argp], errno);
return 1;
}
rc = dofile (fin);
fclose (fin);
if (rc != 0)
break;
argp++;
}
}
return rc;
}
Then, you can simply run dog without arguments (so it will use standard input) and, after five seconds with no activity, it will output:
*** Timeout on select
(a) Actually, it was called slowcat but snail is much nicer and I'm not above a bit of minor revisionism if it makes the story sound better :-)

mbuffer, with its -W option, works for me.
I needed to sink stdin to a file, but with an idle timeout:
I did not need to actually concatenate multiple sources (but perhaps there are ways to use mbuffer for this.)
I did not need any of cat's possible output-formatting options.
I did not mind the progress bar that mbuffer brings to the table.
I did need to add -A /bin/false to suppress a warning, based on a suggestion in the linked man page. My invocation for copying stdin to a file with 10 second idle timeout ended up looking like
mbuffer -A /bin/false -W 10 -o ./the-output-file

Here is the code for timeout-cat:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
void timeout(int sig) {
exit(EXIT_FAILURE);
}
int main(int argc, char* argv[]) {
int sec = 0; /* seconds to timeout (0 = no timeout) */
int c;
if (argc > 1) {
sec = atoi(argv[1]);
signal(SIGALRM, timeout);
alarm(sec);
}
while((c = getchar()) != EOF) {
alarm(0);
putchar(c);
alarm(sec);
}
return EXIT_SUCCESS;
}
It does basically the same as paxdiablo's dog.
It works as a cat without an argument - catting the stdin. As a first argument provide timeout seconds.
One limitation (applies to dog as well) - lines are line-buffered, so you have n-seconds to provide a line (not any character) to reset the timeout alarm. This is because of readline.
usage:
instead of potentially endless:
cat < some_input > some_output
you can do compile code above to timeout_cat and:
./timeout_cat 5 < some_input > some_output

Try to consider tail -f --pid
I am assuming that you are reading some file and when the producer is finished (gone?) you stop.
Example that will process /var/log/messages until watcher.sh finishes.
./watcher.sh&
tail -f /var/log/messages --pid $! | ... do something with the output

I faced same issue of cat command blocking while reading on tty port via adb shell but did not find any solution (timeout command was also not working). Below is the final command I used in my python script (running on ubuntu) to make it non-blocking. Hope this will help someone.
bash_command = "adb shell \"echo -en 'ATI0\\r\\n' > /dev/ttyUSB0 && cat /dev/ttyUSB0\" & sleep 1; kill $!"
response = subprocess.check_output(['bash', '-c', bash_command])

Simply cat then kill the cat after 5 sec.
cat xyz & sleep 5; kill $!
Get the cat output as a reply after 5 seconds
reply="`cat xyz & sleep 5; kill $!`"
echo "reply=$reply"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Reading binary data through stdin with cat command on linux - linux

Related

Processing backspace control character (^H) in real time while logging sdout to file

more than one command for system call in linux

Faster and precise way to count lines other than wc -l

How to find matching patterns between two text files and output to another file?

Reading with cat: Stop when not receiving data

Categories

Resources