Read numbers from file until line < n

Read numbers from file until line < n - linux

As shocked as I am, I can't find this anywhere, and my bash skills are still sub-par.
I have a text file of prime numbers:
2\n
3\n
5\n
7\n
11\n
etc...
I want to pull all primes under 2^32 (4294967296) plus one additional prime number, and save these primes to the own text file formatted the same way. Also, my file has just over 1.3 billion lines so far, so stopping after the limit would be ideal.
Update: Problem.
The bash script has been looping through these 11 numbers for quite some time without me noticing:
4232004449
4232004479
4232004493
4232004509
4232004527
4232004533
4232004559
4232004589
4232004593
4232004613
004437
What's even weirder is I grepped primes.txt (the original) and "^004437" was nowhere to be found. Is this some kind of limitation of bash?
Update: Solution
It appears to be some kind of limitation of something, I really don't know what. I'm re-chosing the perl script as my answer because not only did it work, but it created the ~2GB from nothing in ~80 seconds and included the additional prime. Go here for a solution to the bash error.

$ perl -lne 'print; last if $_ > 2**32' < myprimes.txt > myprimes2.txt
Gives you the input series of primes up to one prime past 2**32, then stops. Does not read source file into memory.

In shell, without loading the whole 1.3 billion numbers into memory, you can use:
n=4294967296
last=0
while read number
do
if [ $last -gt $n ]
then break
fi
echo $number
last=$number
done < primes.txt > primes2.txt
You could lose the last variable too:
n=4294967296
while read number
do
echo $number
if [ $number -gt $n ]
then break
fi
done < primes.txt > primes2.txt

This is very easy to do in Bash! Just cat the file primes.txt to read it, go through each number, check that the number is less than 2^32, and if it is, append it to primes2.txt.
The exact code is below.
#!/bin/bash
n=4294967296; # 2^32
for i in `cat primes.txt`
do
if [ $i -le $n ]
then
echo $i >> primes2.txt;
fi
done
Or you can use this simple Python solution, which does not require loading the entire file into memory.
new_primes = open('primes2.txt', 'a')
n = 2**32
[new_primes.write(p) for p in open('primes.txt', 'r') if int(p) < n]

I would recommend doing something like this in Perl:
EDIT: Hm, it was probably the array that used up all your RAM - this should be more friendly to your resources.
#!/usr/bin/env perl
use warnings;
use strict;
my $max_value = ( 2 ** 32);
my $input_file = 'primes.txt';
my $output_file = 'primes2.txt';
open( my $INPUT_FH, '<', $input_file )
or die "could not open file: $!";
open ( my $OUTPUT_FH, '>', $output_file )
or die "could not open file: $!";
foreach my $prime ( <$INPUT_FH> ) {
chomp($prime);
unless ( $prime >= $max_value ) { print $OUTPUT_FH "$prime","\n"; }
}

Related

0999: Value too great for base (error token is "0999")

This is a shortened-version of a script for reading 8mm tapes from a EXB-8500 with an autoloader (only 10 tapes at a time maximum) attached. It dd's in tape data (straight binary) and saves it to files that are named after the tape's 4-digit number (exmaple D1002.dat) in both our main storage and our backup. During this time it's logging info and displaying its status in the terminal so we can see how far along it is.
#!/bin/bash
echo "Please enter number of tapes: [int]"
read i
j=1
until [ $i -lt $j ]
do
echo "What is the number of tape $j ?"
read Tape_$j
(( j += 1 ))
done
echo "Load tapes into the tower and press return when the drive is ready"
read a
j=1
until [ $i -lt $j ]
do
k="Tape_$j"
echo "tower1 $j D$(($k)) `date` Begin"
BEG=$j" "D$(($k))" "`date`" ""Begin"
echo "tower1 $j D$(($k)) `date` End"
END=$j" "D$(($k))" "`date`" ""End"
echo "$BEG $END"
echo "$BEG $END"
sleep 2
(( j += 1 ))
done
echo "tower1 done"
Everything was hunky-dory until we got under 1000 (startig at 0999). Error code was ./tower1: 0999: Value too great for base (error token is "0999"). Now I already realize that this is because the script is forcing octal values when I type in the leading 0, and I know I should insert a 10# somewhere in the script, but the question is: Where?
Also is there a way for me to just define Tape_$j as a string? I feel like that would clear up a lot of these problems
To get the error, run the script, define however many tapes you want (at least one, lol), and insert a leading 0 into the name of the tape
EXAMPLE:
./test
Please enter number of tapes: [int]
1
What is the number of tape 1?
0999
./test: 0999: Value too great for base (error token is "0999")

You don't want to use $k as a number, but as a string. You used the numeric expression to evaluate a variable value as a variable name. That's very bad practice.
Fortunately, you can use variable indirection in bash to achieve your goal. No numbers involved, no error thrown.
echo "tower1 $j ${!k} `date` Begin"
BEG=$j" "D${!k}" "`date`" ""Begin"
And similarly in other places.

perl program for reading file contents

I want to write a perl program for opening a file and reading its content and the printing the number of lines, words and characters there are. I also want to print the number of times a specific word appeared in the file. Here is what I have done:
#! /usr/bin/perl
open( FILE, "test1.txt" ) or die "could not open file $1";
my ( $line, $word, $chars ) = ( 0, 0, 0 );
while (<FILE>) {
$line++;
$words += scalar( split( /\s+/, $_ ) );
$chars += length($_);
print $_;
}
$chars -= $words;
print(
"Total number of lines in the file:= $line \nTotal number of words in the file:= $words \nTotal number of chars in the file:= $chars\n"
);
As you can clearly see, I don't have any provision for taking user input of the words whose occurrence is to be counted. Because I don't know how to do it. Please help with counting of the number of occurrence part. Thank you

I guess you're doing this for learning purposes, so here is a good readable version of your problem (there might be a thousand others, because it's perl). If not, there's wc on the linxux command line.
Note that I'm using three argument open, it's generally better to do that.
For counting single words you'll most probably need a hash. And I used <<HERE docs, because they are nicer for formating. If you have any doubts, just look in the perldoc and ask your questions.
#!/usr/bin/env perl
use warnings; # Always use this
use strict; # ditto
my ($chars,$word_count ,%words);
{
open my $file, '<', 'test.txt'
or die "couldn't open `test.txt':\n$!";
while (<$file>){
foreach (split){
$word_count++;
$words{$_}++;
$chars += length;
}
}
} # $file is now closed
print <<THAT;
Total number of lines: $.
Total number of words: $word_count
Total number of chars: $chars
THAT
# Now to your questioning part:
my $prompt= <<PROMPT.'>>';
Please enter the words you want the occurrences for. (CTRL+D ends the program)
PROMPT
print $prompt;
while(<STDIN>){
chomp; # get rid of the newline
print "$_ ".(exists $words{$_}?"occurs $words{$_} times":"doesn't occur")
." in the file\n",$prompt;
}

How to rename multiple files in terminal (LINUX)?

I have bunch of files with no pattern in their name at all in a directory. all I know is that they are all Jpg files. How do I rename them, so that they will have some sort of sequence in their name.
I know in Windows all you do is select all the files and rename them all to a same name and Windows OS automatically adds sequence numbers to compensate for the same file name.
I want to be able to do that in Linux Fedora but I you can only do that in Terminal. Please, help. I am lost.
What is the command for doing this?

The best way to do this is to run a loop in the terminal going from picture to picture and renaming them with a number that gets bigger by one with every loop.
You can do this with:
n=1
for i in *.jpg; do
p=$(printf "%04d.jpg" ${n})
mv ${i} ${p}
let n=n+1
done
Just enter it into the terminal line by line.
If you want to put a custom name in front of the numbers, you can put it before the percent sign in the third line.
If you want to change the number of digits in the names' number, just replace the '4' in the third line (don't change the '0', though).

I will assume that:
There are no spaces or other weird control characters in the file names
All of the files in a given directory are jpeg files
That in mind, to rename all of the files to 1.jpg, 2.jpg, and so on:
N=1
for a in ./* ; do
mv $a ${N}.jpg
N=$(( $N + 1 ))
done
If there are spaces in the file names:
find . -type f | awk 'BEGIN{N=1}
{print "mv \"" $0 "\" " N ".jpg"
N++}' | sh
Should be able to rename them.
The point being, Linux/UNIX does have a lot of tools which can automate a task like this, but they have a bit of a learning curve to them

Create a script containing:
#!/bin/sh
filePrefix="$1"
sequence=1
for file in $(ls -tr *.jpg) ; do
renamedFile="$filePrefix$sequence.jpg"
echo $renamedFile
currentFile="$(echo $file)"
echo "renaming \"$currentFile\" to $renamedFile"
mv "$currentFile" "$renamedFile"
sequence=$(($sequence+1))
done
exit 0
If you named the script, say, RenameSequentially then you could issue the command:
./RenameSequentially Images-
This would rename all *.jpg files in the directory to Image-1.jpg, Image-2.jpg, etc... in order of oldest to newest... tested in OS X command shell.

I wrote a perl script a long time ago to do pretty much what you want:
#
# reseq.pl renames files to a new named sequence of filesnames
#
# Usage: reseq.pl newname [-n seq] [-p pad] fileglob
#
use strict;
my $newname = $ARGV[0];
my $seqstr = "01";
my $seq = 1;
my $pad = 2;
shift #ARGV;
if ($ARGV[0] eq "-n") {
$seqstr = $ARGV[1];
$seq = int $seqstr;
shift #ARGV;
shift #ARGV;
}
if ($ARGV[0] eq "-p") {
$pad = $ARGV[1];
shift #ARGV;
shift #ARGV;
}
my $filename;
my $suffix;
for (#ARGV) {
$filename = sprintf("${newname}_%0${pad}d", $seq);
if (($suffix) = m/.*\.(.*)/) {
$filename = "$filename.$suffix";
}
print "$_ -> $filename\n";
rename ($_, $filename);
$seq++;
}
You specify a common prefix for the files, a beginning sequence number and a padding factor.
For exmaple:
# reseq.pl abc 1 2 *.jpg
Will rename all matching files to abc_01.jpg, abc_02.jpg, abc_03.jpg...

Using Perl or Linux built-in command-line tools how quickly map one integer to another?

I have a text file mapping of two integers, separated by commas:
123,456
789,555
...
It's 120Megs... so it's a very long file.
I keep to search for the first column and return the second, e.g., look up 789 --returns--> 555 and I need to do it FAST, using regular Linux built-ins.
I'm doing this right now and it takes several seconds per look-up.
If I had a database I could index it. I guess I need an indexed text file!
Here is what I'm doing now:
my $lineFound=`awk -F, '/$COLUMN1/ { print $2 }' ../MyBigMappingFile.csv`;
Is there any easy way to pull this off with a performance improvement?

The hash suggestions are the natural way an experienced Perler would do this, but it may be suboptimal in this case. It scans the entire file and builds a large, flat datastructure in linear time. Cruder methods can short circuit with a worst case linear time, usually less in practice.
I first made a big mapping file:
my $LEN = shift;
for (1 .. $LEN) {
my $rnd = int rand( 999 );
print "$_,$rnd\n";
}
With $LEN passed on the command line as 10000000, the file came out to 113MB. Then I benchmarked three implemntations. The first is the hash lookup method. The second slurps the file and scans it with a regex. The third reads line-by-line and stops when it matches. Complete implementation:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw{timethese};
my $FILE = shift;
my $COUNT = 100;
my $ENTRY = 40;
slurp(); # Initial file slurp, to get it into the hard drive cache
timethese( $COUNT, {
'hash' => sub { hash_lookup( $ENTRY ) },
'scalar' => sub { scalar_lookup( $ENTRY ) },
'linebyline' => sub { line_lookup( $ENTRY ) },
});
sub slurp
{
open( my $fh, '<', $FILE ) or die "Can't open $FILE: $!\n";
undef $/;
my $s = <$fh>;
close $fh;
return $s;
}
sub hash_lookup
{
my ($entry) = #_;
my %data;
open( my $fh, '<', $FILE ) or die "Can't open $FILE: $!\n";
while( <$fh> ) {
my ($name, $val) = split /,/;
$data{$name} = $val;
}
close $fh;
return $data{$entry};
}
sub scalar_lookup
{
my ($entry) = #_;
my $data = slurp();
my ($val) = $data =~ /\A $entry , (\d+) \z/x;
return $val;
}
sub line_lookup
{
my ($entry) = #_;
my $found;
open( my $fh, '<', $FILE ) or die "Can't open $FILE: $!\n";
while( <$fh> ) {
my ($name, $val) = split /,/;
if( $name == $entry ) {
$found = $val;
last;
}
}
close $fh;
return $found;
}
Results on my system:
Benchmark: timing 100 iterations of hash, linebyline, scalar...
hash: 47 wallclock secs (18.86 usr + 27.88 sys = 46.74 CPU) # 2.14/s (n=100)
linebyline: 47 wallclock secs (18.86 usr + 27.80 sys = 46.66 CPU) # 2.14/s (n=100)
scalar: 42 wallclock secs (16.80 usr + 24.37 sys = 41.17 CPU) # 2.43/s (n=100)
(Note I'm running this off an SSD, so I/O is very fast, and perhaps makes that initial slurp() unnecessary. YMMV.)
Interestingly, the hash implementation is just as fast as linebyline, which isn't what I expected. By using slurping, scalar may end up being faster on a traditional hard drive.
However, by far the fastest is a simple call to grep:
$ time grep '^40,' int_map.txt
40,795
real 0m0.508s
user 0m0.374s
sys 0m0.046
Perl could easily read that output and split apart the comma in hardly any time at all.
Edit: Never mind about grep. I misread the numbers.

120 meg isn't that big. Assuming you've got at least 512MB of ram, you could easily read the whole file into a hash and then do all of your lookups against that.

use:
sed -n "/^$COLUMN1/{s/.*,//p;q}" file
This optimizes your code in three ways:
1) No needless splitting each line in two on ",".
2) You stop processing the file after the first hit.
3) sed is faster than awk.
This should more than half your search time.
HTH Chris

It all depends on how often the data change and how often in the course of a single script invocation you need to look up.
If there are many lookups during each script invocation, I would recommend parsing the file into a hash (or array if the range of keys is narrow enough).
If the file changes every day, creating a new SQLite database might or might not be worth your time.
If each script invocation needs to look up just one key, and if the data file changes often, you might get an improvement by slurping the entire file into a scalar (minimizing memory overhead, and do a pattern match on that (instead of parsing each line).
#!/usr/bin/env perl
use warnings; use strict;
die "Need key\n" unless #ARGV;
my $lookup_file = 'lookup.txt';
my ($key) = #ARGV;
my $re = qr/^$key,([0-9]+)$/m;
open my $input, '<', $lookup_file
or die "Cannot open '$lookup_file': $!";
my $buffer = do { local $/; <$input> };
close $input;
if (my ($val) = ($buffer =~ $re)) {
print "$key => $val\n";
}
else {
print "$key not found\n";
}
On my old slow laptop, with a key towards the end of the file:
C:\Temp> dir lookup.txt
...
2011/10/14 10:05 AM 135,436,073 lookup.txt
C:\Temp> tail lookup.txt
4522701,5840
5439981,16075
7367284,649
8417130,14090
438297,20820
3567548,23410
2014461,10795
9640262,21171
5345399,31041
C:\Temp> timethis lookup.pl 5345399
5345399 => 31041
TimeThis : Elapsed Time : 00:00:03.343

This example loads the file into a hash (which takes about 20s for 120M on my system). Subsequent lookups are then nearly instantaneous. This assumes that each number in the left column is unique. If that's not the case then you would need to push numbers on the right with the same number on the left onto an array or something.
use strict;
use warnings;
my ($csv) = #ARGV;
my $start=time;
open(my $fh, $csv) or die("$csv: $!");
$|=1;
print("loading $csv... ");
my %numHash;
my $p=0;
while(<$fh>) { $p+=length; my($k,$v)=split(/,/); $numHash{$k}=$v }
print("\nprocessed $p bytes in ",time()-$start, " seconds\n");
while(1) { print("\nEnter number: "); chomp(my $i=<STDIN>); print($numHash{$i}) }
Example usage and output:
$ ./lookup.pl MyBigMappingFile.csv
loading MyBigMappingFile.csv...
processed 125829128 bytes in 19 seconds
Enter number: 123
322
Enter number: 456
93
Enter number:

does it help if you cp the file to your /dev/shm, and using /awk/sed/perl/grep/ack/whatever query a mapping?
don't tell me you are working on a 128MB ram machine. :)

Improve my password generation script

I have created a little password generation script. I'm curious to what improvements can be made for it except input error handling, usage information etc. It's the core functionality I'm interested in seeing improvements upon.
This is what it does (and what I like it to do):
Keep it easy to change which Lowercase characters (L), Uppercase characters (U), Numbers (N) and Symbols (S) that are used in passwords.
I'd like it to find a new password of legnth 10 for me in max two seconds.
It should take a variable length of the password string as an argument.
Only a password containing at least one L, U, N and S should be accepted.
Here is the code:
#!/bin/bash
PASSWORDLENGTH=$1
RNDSOURCE=/dev/urandom
L="acdefghjkmnpqrtuvwxy"
U="ABDEFGHJLQRTY"
N="012345679"
S="\-/\\)?=+.%#"
until [ $(echo $password | grep [$L] | grep [$U] | grep [$N] | grep -c [$S] ) == 1 ]; do
password=$(cat $RNDSOURCE | tr -cd "$L$U$N$S" | head -c $PASSWORDLENGTH)
echo In progress: $password # It's simply for debug purposes, ignore it
done
echo Final password: $password
My questions are:
Is there a nicer way of checking if the password is acceptable than the way I'm doing it?
What about the actual password generation?
Any coding style improvements? (The short variable names are temporary. Though I'm using uppercase names for "constants" [I know there formally are none] and lowercase for variables. Do you like it?)
Let's vote on the most improved version. :-)
For me it was just an exercise mostly for fun and as a learning experience, albeit I will start using it instead of the generation from KeepassX which I'm using now. It will be interesting to see which improvements and suggestions will come from more experienced Bashistas (I made that word up).
I created a little basic script to measure performance: (In case someone thinks it's fun)
#!/bin/bash
SAMPLES=100
SCALE=3
echo -e "PL\tMax\tMin\tAvg"
for p in $(seq 4 50); do
bcstr=""; max=-98765; min=98765
for s in $(seq 1 $SAMPLES); do
gt=$(\time -f %e ./genpassw.sh $p 2>&1 1>/dev/null)
bcstr="$gt + $bcstr"
max=$(echo "if($max < $gt ) $gt else $max" | bc)
min=$(echo "if($min > $gt ) $gt else $min" | bc)
done
bcstr="scale=$SCALE;($bcstr 0)/$SAMPLES"
avg=$(echo $bcstr | bc)
echo -e "$p\t$max\t$min\t$avg"
done

You're throwing away a bunch of randomness in your input stream. Keep those bytes around and translate them into your character set. Replace the password=... statement in your loop with the following:
ALL="$L$U$N$S"
password=$(tr "\000-\377" "$ALL$ALL$ALL$ALL$ALL" < $RNDSOURCE | head -c $PASSWORDLENGTH)
The repetition of $ALL is to ensure that there are >=255 characters in the "map to" set.
I also removed the gratuitous use of cat.
(Edited to clarify that what appears above is not intended to replace the full script, just the inner loop.)
Edit: Here's a much faster strategy that doesn't call out to external programs:
#!/bin/bash
PASSWORDLENGTH=$1
RNDSOURCE=/dev/urandom
L="acdefghjkmnpqrtuvwxy"
U="ABDEFGHJLQRTY"
N="012345679"
# (Use this with tr.)
#S='\-/\\)?=+.%#'
# (Use this for bash.)
S='-/\)?=+.%#'
ALL="$L$U$N$S"
# This function echoes a random index into it's argument.
function rndindex() { echo $(($RANDOM % ${#1})); }
# Make sure the password contains at least one of each class.
password="${L:$(rndindex $L):1}${U:$(rndindex $U):1}${N:$(rndindex $N):1}${S:$(rndindex $S):1}"
# Add random other characters to the password until it is the desired length.
while [[ ${#password} -lt $PASSWORDLENGTH ]]
do
password=$password${ALL:$(rndindex $ALL):1}
done
# Now shuffle it.
chars=$password
password=""
while [[ ${#password} -lt $PASSWORDLENGTH ]]
do
n=$(rndindex $chars)
ch=${chars:$n:1}
password="$password$ch"
if [[ $n == $(( ${#chars} - 1 )) ]]; then
chars="${chars:0:$n}"
elif [[ $n == 0 ]]; then
chars="${chars:1}"
else
chars="${chars:0:$n}${chars:$((n+1))}"
fi
done
echo $password
Timing tests show this runs 5-20x faster than the original script, and the time is more predictable from one run to the next.

you could just use uuidgen or pwgen to generate your random passwords, maybe later shuffling some letters around or something of the sort

secpwgen is very good (it can also generate easier to remember diceware passwords) - but has almost disappeared from the net. I managed to track down a copy of the 1.3 source & put it on github.
It is also now part of Alpine Linux.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Read numbers from file until line < n - linux

$ perl -lne 'print; last if $_ > 232' < myprimes.txt > myprimes2.txt Gives you the input series of primes up to one prime past 232, then stops. Does not read source file into memory.

Related

0999: Value too great for base (error token is "0999")

perl program for reading file contents

How to rename multiple files in terminal (LINUX)?

Using Perl or Linux built-in command-line tools how quickly map one integer to another?

Improve my password generation script

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Read numbers from file until line < n - linux

$ perl -lne 'print; last if $_ > 2**32' < myprimes.txt > myprimes2.txt Gives you the input series of primes up to one prime past 2**32, then stops. Does not read source file into memory.

Related

0999: Value too great for base (error token is "0999")

perl program for reading file contents

How to rename multiple files in terminal (LINUX)?

Using Perl or Linux built-in command-line tools how quickly map one integer to another?

Improve my password generation script

Categories

Resources

$ perl -lne 'print; last if $_ > 232' < myprimes.txt > myprimes2.txt Gives you the input series of primes up to one prime past 232, then stops. Does not read source file into memory.