Usually I use wc -l to count the lines of a file. However for a file with 5*10^7 lines, I get only 10^7 as an answer.
I've tried everything proposed here here:
How to count lines in a document?
But it takes to much time than wc -l.
Is there any other option?
Anyone serious about speed line counting can just create their own implementation:
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#define BUFFER_SIZE (1024 * 16)
char BUFFER[BUFFER_SIZE];
int main(int argc, char** argv) {
unsigned int lines = 0;
int fd, r;
if (argc > 1) {
char* file = argv[1];
if ((fd = open(file, O_RDONLY)) == -1) {
fprintf(stderr, "Unable to open file \"%s\".\n", file);
return 1;
}
} else {
fd = fileno(stdin);
}
while ((r = read(fd, BUFFER, BUFFER_SIZE)) > 0) {
char* p = BUFFER;
while ((p = memchr(p, '\n', (BUFFER + r) - p))) {
++p;
++lines;
}
}
close(fd);
if (r == -1) {
fprintf(stderr, "Read error.\n");
return 1;
}
printf("%d\n", lines);
return 0;
}
Usage
a < input
... | a
a file
Example:
# time ./wc temp.txt
10000000
real 0m0.115s
user 0m0.102s
sys 0m0.014s
# time wc -l temp.txt
10000000 temp.txt
real 0m0.120s
user 0m0.103s
sys 0m0.016s
* Code compiled with -O3 natively on a system with AVX and SSE4.2 using GCC 4.8.2.
You could try sed
sed -n '$=' file
The = says to print the line number, and the dollar says to only do it on the last line. The -n says not to do too much else.
Or here's a way in Perl, save this as wc.pl and do chmod +x wc.pl:
#!/usr/bin/perl
use strict;
use warnings;
my $filename = <#ARGV>;
my $lines = 0;
my $buffer;
open(FILE, $filename) or die "ERROR: Can not open file: $!";
while (sysread FILE, $buffer, 65536) {
$lines += ($buffer =~ tr/\n//);
}
close FILE;
print "$lines\n";
Run it like this:
wc.pl yourfile
Basically it reads your file in in chunks of 64kB at a time and then takes advantage of the fact that tr returns the number of substitutions it has made after asking it to delete all newlines.
Try with nl and see what happens...
You can get the line count using awk as well like below
awk 'END {print NR}' names.txt
(OR) Using while .. do .. done bash loop construct like
CNT=0; while read -r LINE; do (( CNT++ )); done < names.txt; echo $CNT
Depends on how you open the file, but probably reading it from STDIN instead would get the fix:
wc -l < file
Related
I am working on a script to test new-to-me hard drives in the background (so I can close the terminal window) and log the outputs. My problem is in getting badblocks to print stdout to the log file so I can monitor its multi-day progress and create properly formatted update emails.
I have been able to print stdout to a log file with the following: (flags are r/w, % monitor, verbose)
sudo badblocks -b 4096 -wsv /dev/sdx 2>&1 | tee sdx.log
Normally the output would look like:
Testing with pattern 0xaa: 2.23% done, 7:00 elapsed. (0/0/0 errors)
No new-line character is used, the ^H control command backs up the cursor, and then the new updated status overwrites the previous status.
Unfortunately, the control character is not processed but saved as a character in the file, producing the above output followed by 43 copies of ^H, the new updated stats, 43 copies of ^H, etc.
Since the output is updated at least once per second, this produces a much larger file than necessary, and makes it difficult to retrieve the current status.
While working in terminal, the solution cat sdx.log && echo"" prints the expected/wanted results by parsing the control characters (and then inserting a carriage return so it is not immediately printed over by the next terminal line), but using cat sdx.log > some.file or cat sdx.log | mail both still include all of the extra characters (though in email they are interpreted as spaces). This solution (or ones like it which decode or remove the control character at the time of access still produce a huge, unnecessary output file.
I have worked my way through the following similar questions, but none have produced (at least that I can figure out) a solution which works in real time with the output to update the file, instead requiring that the saved log file be processed separately after the task has finished writing, or that the log file not be written until the process is done, both of which defeat the stated goal of monitoring progress.
Bash - process backspace control character when redirecting output to file
How to "apply" backspace characters within a text file (ideally in vim)
Thank you!
The main place I've run into this in real life is trying to process man pages. In the past, I've always used a simple script that post processes by stripping out the backspace appropriately. One could probably do this sort of thing in 80 character of perl, but here's an approach that handles backspace and cr/nl fairly well. I've not tested extensively, but it produces good output for simple cases. eg:
$ printf 'xxx\rabclx\bo\rhel\nworld\n' | ./a.out output
hello
world
$ cat output
hello
world
$ xxd output
00000000: 6865 6c6c 6f0a 776f 726c 640a hello.world.
If your output starts to have a lot of csi sequences, this approach just isn't worth the trouble. cat will produce nice human consumable output for those cases.
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
FILE * xfopen(const char *path, const char *mode);
off_t xftello(FILE *stream, const char *name);
void xfseeko(FILE *stream, off_t offset, int whence, const char *name);
int
main(int argc, char **argv)
{
const char *mode = "w";
char *name = strchr(argv[0], '/');
off_t last = 0, max = 0, curr = 0;
name = name ? name + 1 : argv[0];
if( argc > 1 && ! strcmp(argv[1], "-a")) {
argv += 1;
argc -= 1;
mode = "a";
}
if( argc > 1 && ! strcmp(argv[1], "-h")) {
printf("usage: %s [-a] [-h] file [ file ...]\n", name);
return EXIT_SUCCESS;
}
if( argc < 2 ) {
fprintf(stderr, "Missing output file. -h for usage\n");
return EXIT_FAILURE;
}
assert( argc > 1 );
argc -= 1;
argv += 1;
FILE *ofp[argc];
for( int i = 0; i < argc; i++ ) {
ofp[i] = xfopen(argv[i], mode);
}
int c;
while( ( c = fgetc(stdin) ) != EOF ) {
fputc(c, stdout);
for( int i = 0; i < argc; i++ ) {
if( c == '\b' ) {
xfseeko(ofp[i], -1, SEEK_CUR, argv[i]);
} else if( isprint(c) ) {
fputc(c, ofp[i]);
} else if( c == '\n' ) {
xfseeko(ofp[i], max, SEEK_SET, argv[i]);
fputc(c, ofp[i]);
last = curr + 1;
} else if( c == '\r' ) {
xfseeko(ofp[i], last, SEEK_SET, argv[i]);
}
}
curr = xftello(ofp[0], argv[0]);
if( curr > max ) {
max = curr;
}
}
return 0;
}
off_t
xftello(FILE *stream, const char *name)
{
off_t r = ftello(stream);
if( r == -1 ) {
perror(name);
exit(EXIT_FAILURE);
}
return r;
}
void
xfseeko(FILE *stream, off_t offset, int whence, const char *name)
{
if( fseeko(stream, offset, whence) ) {
perror(name);
exit(EXIT_FAILURE);
}
}
FILE *
xfopen(const char *path, const char *mode)
{
FILE *fp = fopen(path, mode);
if( fp == NULL ) {
perror(path);
exit(EXIT_FAILURE);
}
return fp;
}
You can delete the ^H
sudo badblocks -b 4096 -wsv /dev/sdx 2>&1 | tr -d '\b' | tee sdx.log
I have found col -b and colcrt usefull, but none worked perfect for me. These will apply control characters, not just drop them:
sudo badblocks -b 4096 -wsv /dev/sdx 2>&1 | col -b | tee sdx.log
I have a text file that contains a folder name.I want to read that text file's context via Ubuntu terminal and make a folder with that name(which is written in the txt file).I don't know what to do,I have made a file named "in.txt" that its context is "name" and i have tried the c code below:
#include <stdio.h>
int main(){
FILE *fopen (const char in.txt, const char r+);
}
What should i write in the terminal?
Let the text file in.txt be:
foldername1
foldername2
Then you can create a script file script.sh:
#!/usr/bin/env bash
# $1 will be replaced with the first argument passed to the script:
mkdir -p $(cat "$1")
Run the script in the shell as: ./script.sh in.txt. The folder names specified in in.txt should now be available. This assumes one file name per line in in.txt without space.
Lets say you have text file called in.txt which contains
Rambo
Johnny
Jacky
Hulk
Next you need to read word by word from file, for that you can use fscanf(). Once word is read, you need to create a folder with that word name. To create a folder command is mkdir. Next you need to execute mkdir from file for that you use either system() or execlp().
Here is the simple C program
int main(){
FILE *fp = fopen ("in.txt", "r");
if(fp == NULL) {
/* write some message that file is not present */
return 0;
}
char buf[100];
while(fscanf(fp,"%s",buf) > 0) {
/* buf contains each word of file */
/* now create that folder use execlp */
if(fork() ==0 )
execlp("/bin/mkdir","mkdir",buf,NULL); /* this will create folder with name available in file */
else
;
}
fclose(fp);
return 0;
}
Note that mkdir fails if directory already exists.
Compile like gcc -Wall test.c and execute it like ./a.out it will create the folders.
EDIT : Same If you want to using open() system call, there you won't any system calls which reads word by words like so you need to string manipulation.
int main(int argc, char *argv[]) {
int inputfd = open("in.txt",O_RDONLY);
perror("open");
if (inputfd == -1) {
exit(EXIT_FAILURE);
}
/* first find the size of file */
int pos = lseek(inputfd,0,2);
printf("pos = %d \n",pos);
/* againg make inputfd to point to beginning */
lseek(inputfd,0,0);
/*allocate memory equal to size of file, not random 1024 bytes */
char *buf = malloc(pos);
read(inputfd,buf,pos);/* read will read whole file data at a time */
/* you need to find the words from the buf, bcz buf contain whole data not one word */
char cmd[50];/* buffer to store folder name */
for(int row = 0,index = 0; buf[row]; row++) {
if(buf[row]!=' ' && buf[row]!='\n') {
cmd[index] = buf[row];
index++;
continue;
}
else {
cmd[index] = '\0';
index = 0;/* for next word, it should start from cmd[0] */
if(fork() == 0 )
execlp("/bin/mkdir","mkdir",cmd,NULL);
else ;
}
}
close(inputfd);
return 0;
}
Another option if you want to perform other commands besides mkdir
while read -r line; do mkdir -p $line && chown www-data:www-data $line; done < file.txt
You can use for loop from Linux terminal like:
for line in $(cat in.txt); do mkdir $line; done
That should work fine to create multiple files.
It is clear that one can use the
#!/usr/bin/perl
shebang notation in the very first line of a script to define the interpreter. However, this presupposes an interpreter that ignores hashmark-starting lines as comments. How can one use an interpreter that does not have this feature?
With a wrapper that removes the first line and calls the real interpreter with the remainder of the file. It could look like this:
#!/bin/sh
# set your "real" interpreter here, or use cat for debugging
REALINTERP="cat"
tail -n +2 $1 | $REALINTERP
Other than that: In some cases ignoring the error message about that first line could be an option.
Last resort: code support for the comment char of your interpreter into the kernel.
I think the first line is interpreted by the operating system.
The interpreter will be started and the name of the script is handed down to the script as its first parameter.
The following script 'first.myint' calls the interpreter 'myinterpreter' which is the executable from the C program below.
#!/usr/local/bin/myinterpreter
% 1 #########
2 xxxxxxxxxxx
333
444
% the last comment
The sketch of the personal interpreter:
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUFFERSIZE 256 /* input buffer size */
int
main ( int argc, char *argv[] )
{
char comment_leader = '%'; /* define the coment leader */
char *line = NULL;
size_t len = 0;
ssize_t read;
// char buffer[BUFFERSIZE];
// argv[0] : the name of this executable
// argv[1] : the name the script calling this executable via shebang
FILE *input; /* input-file pointer */
char *input_file_name = argv[1]; /* the script name */
input = fopen( input_file_name, "r" );
if ( input == NULL ) {
fprintf ( stderr, "couldn't open file '%s'; %s\n",
input_file_name, strerror(errno) );
exit (EXIT_FAILURE);
}
while ((read = getline(&line, &len, input)) != -1) {
if ( line[0] != comment_leader ) {
printf( "%s", line ); /* print line as a test */
}
else {
printf ( "Skipped a comment!\n" );
}
}
free(line);
if( fclose(input) == EOF ) { /* close input file */
fprintf ( stderr, "couldn't close file '%s'; %s\n",
input_file_name, strerror(errno) );
exit (EXIT_FAILURE);
}
return EXIT_SUCCESS;
} /* ---------- end of function main ---------- */
Now call the script (made executable before) and see the output:
...~> ./first.myint
#!/usr/local/bin/myinterpreter
Skipped a comment!
2 xxxxxxxxxxx
333
444
Skipped a comment!
I made it work. I especially thank holgero for his tail opion trick
tail -n +2 $1 | $REALINTERP
That, and finding this answer on Stack overflow made it possible:
How to compile a linux shell script to be a standalone executable *binary* (i.e. not just e.g. chmod 755)?
"The solution that fully meets my needs would be SHC - a free tool"
SHC is a shell to C translator, see here:
http://www.datsi.fi.upm.es/~frosal/
So I wrote polyscript.sh:
$ cat polyscript.sh
#!/bin/bash
tail -n +2 $1 | poly
I compiled this with shc and in turn with gcc:
$ shc-3.8.9/shc -f polyscript.sh
$ gcc -Wall polyscript.sh.x.c -o polyscript
Now, I was able to create a first script written in ML:
$ cat smlscript
#!/home/gergoe/projects/shebang/polyscript $0
print "Hello World!"
and, I was able to run it:
$ chmod u+x smlscript
$ ./smlscript
Poly/ML 5.4.1 Release
> > # Hello World!val it = (): unit
Poly does not have an option to suppress compiler output, but that's not an issue here. It might be interesting to write polyscript directly in C as fgm suggested, but probably that wouldn't make it faster.
So, this is how simple it is. I welcome any comments.
I have two text files with different text organization. Both files contain few identical patterns (numbers) in the text. I'd like to find which patterns (numbers) are present in both files and write them to the output file.
file1.txt:
blablabla_25947.bkwjcnwelkcnwelckme
blablabla_111.bkwjcnwelkcnwelckme
blablabla_65155.bkwjcnwelkcnwelckme
blablabla_56412.bkwjcnwelkcnwelckme
file2.txt:
blablabla_647728.bkwjcnwelkcnwelck
kjwdhcwkejcwmekcjwhemckwejhcmwekch
blablabla_6387.bkwjcnwelkcnwelckme
wexkwhenqlciwuehnqweiugfnwekfiugew
wedhwnejchwenckhwqecmwequhcnkwjehc
owichjwmelcwqhemclekcelmkjcelkwejc
blablabla_59148.bkwjcnwelkcnwelckme
ecmwequhcnkwjehcowichjwmelcwqhemcle
kcelmkjcelkwejcwecawecwacewwAWWAXEG
blablabla_111.bkwjcnwelkcnwelckm
WESETRBRVSSCQEsfdveradassefwaefawecc
output_file.txt:
111
How about:
$ egrep -o '_[0-9]+\.' file1 | grep -of - file2 | tr -d '_.'
111
# Redirect to new file
$ egrep -o '_[0-9]+\.' file1 | grep -of - file2 | tr -d '_.' > file3
First grep gets all the digit strings (preceded by _ and preceding .) from file1 and this list is used to grep the matches in file2. The _ and . are stripped using tr.
I did in fact try to solve the "hard problem" that I thought you were posing. The following code looks for the longest string found in both file1 and file2. If there are multiple "longest" strings, it only reports the first one found. May be helpful to someone, at some point (although maybe not the solution you are looking for here):
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <sys/stat.h>
/* This routine returns the size of the file it is called with. */
static unsigned
get_file_size (const char * file_name)
{
struct stat sb;
if (stat (file_name, & sb) != 0) {
fprintf (stderr, "'stat' failed for '%s': %s.\n",
file_name, strerror (errno));
exit (EXIT_FAILURE);
}
return sb.st_size;
}
/* This routine reads the entire file into memory. */
static unsigned char *
read_whole_file (const char * file_name)
{
unsigned s;
unsigned char * contents;
FILE * f;
size_t bytes_read;
int status;
s = get_file_size (file_name);
contents = malloc (s + 1);
if (! contents) {
fprintf (stderr, "Not enough memory.\n");
exit (EXIT_FAILURE);
}
f = fopen (file_name, "r");
if (! f) {
fprintf (stderr, "Could not open '%s': %s.\n", file_name,
strerror (errno));
exit (EXIT_FAILURE);
}
bytes_read = fread (contents, sizeof (unsigned char), s, f);
if (bytes_read != s) {
fprintf (stderr, "Short read of '%s': expected %d bytes "
"but got %d: %s.\n", file_name, s, bytes_read,
strerror (errno));
exit (EXIT_FAILURE);
}
status = fclose (f);
if (status != 0) {
fprintf (stderr, "Error closing '%s': %s.\n", file_name,
strerror (errno));
exit (EXIT_FAILURE);
}
return contents;
}
int main(int argc, char* argv[]){
int i1, i2, l1, l2, lm;
unsigned char longestString[1000]; // lazy way to make big enough.
unsigned char tempString[1000];
int longestFound=0;
unsigned char *f1, *f2; // buffers with entire file contents
f1 = read_whole_file (argv[1]);
f2 = read_whole_file (argv[2]);
l1 = strlen(f1);
l2 = strlen(f2);
for(i1 = 0; i1 < l1; i1++) {
lm = 0;// length of match
for(i2 = i1; i2<l2; i2++) {
lm = 0;
while (f1[i1+lm] == f2[i2+lm] && (i1+lm<l1) && (i2+lm<l2) && lm < 1000-1) {
tempString[lm] = f1[i1+lm];
lm++;
}
if (lm > longestFound) {
tempString[lm]=0; // terminate string
strcpy(longestString, tempString);
longestFound = lm;
}
}
}
printf("longest string found is %d characters:\n", longestFound);
printf("%s\n", longestString);
free(f1);
free(f2);
return 0;
}
The code for reading entire file contents was found at http://www.lemoda.net/c/read-whole-file/index.html
Is there any way to tell the cat command to stop reading when not receiving any data? maybe with some "timeout" that specifies for how long no data is incoming.
Any ideas?
There is a timeout(1) command. Example:
timeout 5s cat /dev/random
Dependening on your circumstances. E.g. you run bash with -e and care normally for the exit code.
timeout 5s cat /dev/random || true
cat itself, no. It reads the input stream until told it's the end of the file, blocking for input if necessary.
There's nothing to stop you writing your own cat equivalent which will use select on standard input to timeout if nothing is forthcoming fast enough, and exit under those conditions.
In fact, I once wrote a snail program (because a snail is slower than a cat) which took an extra argument of characters per second to slowly output a file (a).
So snail 10 myprog.c would output myprog.c at ten characters per second. For the life of me, I can't remember why I did this - I suspect I was just mucking about, waiting for some real work to show up.
Since you're having troubles with it, here's a version of dog.c (based on my afore-mentioned snail program) that will do what you want:
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/select.h>
static int dofile (FILE *fin) {
int ch = ~EOF, rc;
fd_set fds;
struct timeval tv;
while (ch != EOF) {
// Set up for fin file, 5 second timeout.
FD_ZERO (&fds); FD_SET (fileno (fin), &fds);
tv.tv_sec = 5; tv.tv_usec = 0;
rc = select (fileno(fin)+1, &fds, NULL, NULL, &tv);
if (rc < 0) {
fprintf (stderr, "*** Error on select (%d)\n", errno);
return 1;
}
if (rc == 0) {
fprintf (stderr, "*** Timeout on select\n");
break;
}
// Data available, so it will not block.
if ((ch = fgetc (fin)) != EOF) putchar (ch);
}
return 0;
}
int main (int argc, char *argv[]) {
int argp, rc;
FILE *fin;
if (argc == 1)
rc = dofile (stdin);
else {
argp = 1;
while (argp < argc) {
if ((fin = fopen (argv[argp], "rb")) == NULL) {
fprintf (stderr, "*** Cannot open input file [%s] (%d)\n",
argv[argp], errno);
return 1;
}
rc = dofile (fin);
fclose (fin);
if (rc != 0)
break;
argp++;
}
}
return rc;
}
Then, you can simply run dog without arguments (so it will use standard input) and, after five seconds with no activity, it will output:
*** Timeout on select
(a) Actually, it was called slowcat but snail is much nicer and I'm not above a bit of minor revisionism if it makes the story sound better :-)
mbuffer, with its -W option, works for me.
I needed to sink stdin to a file, but with an idle timeout:
I did not need to actually concatenate multiple sources (but perhaps there are ways to use mbuffer for this.)
I did not need any of cat's possible output-formatting options.
I did not mind the progress bar that mbuffer brings to the table.
I did need to add -A /bin/false to suppress a warning, based on a suggestion in the linked man page. My invocation for copying stdin to a file with 10 second idle timeout ended up looking like
mbuffer -A /bin/false -W 10 -o ./the-output-file
Here is the code for timeout-cat:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
void timeout(int sig) {
exit(EXIT_FAILURE);
}
int main(int argc, char* argv[]) {
int sec = 0; /* seconds to timeout (0 = no timeout) */
int c;
if (argc > 1) {
sec = atoi(argv[1]);
signal(SIGALRM, timeout);
alarm(sec);
}
while((c = getchar()) != EOF) {
alarm(0);
putchar(c);
alarm(sec);
}
return EXIT_SUCCESS;
}
It does basically the same as paxdiablo's dog.
It works as a cat without an argument - catting the stdin. As a first argument provide timeout seconds.
One limitation (applies to dog as well) - lines are line-buffered, so you have n-seconds to provide a line (not any character) to reset the timeout alarm. This is because of readline.
usage:
instead of potentially endless:
cat < some_input > some_output
you can do compile code above to timeout_cat and:
./timeout_cat 5 < some_input > some_output
Try to consider tail -f --pid
I am assuming that you are reading some file and when the producer is finished (gone?) you stop.
Example that will process /var/log/messages until watcher.sh finishes.
./watcher.sh&
tail -f /var/log/messages --pid $! | ... do something with the output
I faced same issue of cat command blocking while reading on tty port via adb shell but did not find any solution (timeout command was also not working). Below is the final command I used in my python script (running on ubuntu) to make it non-blocking. Hope this will help someone.
bash_command = "adb shell \"echo -en 'ATI0\\r\\n' > /dev/ttyUSB0 && cat /dev/ttyUSB0\" & sleep 1; kill $!"
response = subprocess.check_output(['bash', '-c', bash_command])
Simply cat then kill the cat after 5 sec.
cat xyz & sleep 5; kill $!
Get the cat output as a reply after 5 seconds
reply="`cat xyz & sleep 5; kill $!`"
echo "reply=$reply"