perl encryption script IDEA - linux

Hi everyone im making a perl script to encrypt and decrypt text, i just have started i have this:
#!/usr/bin/perl
use Crypt::IDEA;
my $key = pack("H32", "0123456789ABCDEF0123456789ABCDEF");
my $cipher = new IDEA $key;
my $palabra= "plaintex";
my $ciphertext = $cipher->encrypt($palabra); # NB - 8 bytes
print unpack("H16", $ciphertext), "\n";
my $plaintext = $cipher->decrypt($ciphertext);
print $plaintext , "\n";
The trouble is the text to encrypt must be of 8 bytes of length. why? if i put "plaintext" instead "plaintex" gives me error.
input must be 8 bytes long at /usr/lib/perl5/site_perl/Crypt/IDEA.pm line 62.

Wrap Crypt::IDEA with Crypt::CBC - it will allow to use non-aligned data length. See doc for Crypt::CBC.
This is because IDEA and many other crypt algorithms are block encryption algorithms. This means they operate with blocks of data of specified size, so data you encrypting must be prepared (padded with zeros or whatever)

Try Crypt::CBCeasy
#!/usr/bin/perl --
use strict; use warnings;
use Crypt::CBCeasy qw/ IDEA /;
my $key = 'shabba';
my $text = "plaintex"; ## not a file, not -f -r $text
my $crypted = IDEA::encipher( $key, $text );
my $detext = IDEA::decipher( $key, $crypted );
print join "\n", $key, $text, unpack( 'H*', $crypted ), $detext, '';
__END__
shabba
plaintex
53616c7465645f5fb5ec01275eb466c4b9b69f3edb7568b42c1713416d33b7aa
plaintex

Related

How can I split my data in small enough chunks to feed to Seq?

I am working on a bioinformatics project where I am looking at very large genomes. Seg only reads 135 lines at a time, so when we feed the genomes in it gets overloaded. I am trying to create a perl command that will split the sections into 135 line sections. The character limit would be 10,800 since there are 80 columns. This is what i have so far
#!usr/bin/perl
use warnings;
use strict;
my $str =
'>AATTCCGG
TTCCGGAA
CCGGTTAA
AAGGTTCC
>AATTCCGG';
substr($str,17) = "";
print "$str";
It splits at the 17th character but only prints that section, I want it to continue printing the rest of the data. How do i add a command that allows the rest of the data to be shown. Like it should split at every 17th character continuing. (then of course i can go back in and scale it up to the size i actually need. )
I assume that the "very large genome" is stored in a very large file, and that it is fine to collect data by number of lines (and not by number of characters) since this is the first mentioned criterion.
Then you can read the file line by line and assemble lines until there is 135 of them. Then hand them off to a program or routine that processes that, empty your buffer, and keep going
use warnings;
use strict;
use feature 'say';
my $file = shift || 'default_filename.txt';
my $num_lines_to_process = 135;
open my $fh, '<', $file or die "Can't open $file: $!";
my ($line_counter, #buffer);
while (<$fh>) {
chomp;
if ($line_counter == $num_lines_to_process)
{
process_data(\#buffer);
#buffer = ();
$line_counter = 0;
}
push #buffer, $_;
++$line_counter;
}
process_data(\#buffer) if #buffer; # last batch
sub process_data {
my ($rdata) = #_;
say for #$rdata; say '---'; # print data for a test
}
If your processing application/routine wants a string, you can append to a string every time instead of adding to an array, $buffer .= $_; and clear that by $buffer = ''; as needed.
If you need to pass a string but there is also some use of an array while collecting data (intermediate checks/pruning/processing?), then collect lines into an array and use as needed, and join into a string before handing it off, my $data = join '', #buffer;
You can also make use of the $. variable and the modulo operator (%)
while (<$fh>) {
chomp;
push #buffer, $_;
if ($. % $num_lines_to_process == 0) # every $num_lines_to_process
{
process_data(\#buffer);
#buffer = ();
}
}
process_data(\#buffer) if #buffer; # last batch
In this case we need to first store a line and then check its number, since $. (line number read from a filehandle, see docs linked above) starts from 1 (not 0).
substr returns the removed part of a string; you can just run it in a loop:
while (length $str) {
my $substr = substr $str, 0, 17, "";
print $substr, "\n";
}

perl: useing commas in hash values

I have key value pairs as "statement:test,data" where 'test,data' is the value for hash. While trying to create a hash with such values, perl splits the values on the comma. Is there a way around this where strings with commas can be used as values
There is nothing in Perl that stops you from using 'test,data' as hash value.
If your incoming string is literally "statement:test,data", you can use this code to add into hash:
my ($key, $value) = ($string =~ /(\w+):(.*)/);
next unless $key and $value; # skip bad stuff - up to you
$hash{$key} = $value;
Perl won't split a string on a comma unless you tell it to.
#!/usr/bin/perl
use v5.16;
use warnings;
use Data::Dump 'ddx';
my $data = "statement:test,data";
my %hash;
my ($key, $value) = split(":", $data);
$hash{$key} = $value;
ddx \%hash;
gives:
# split.pl:14: { statement => "test,data" }

Using Perl or Linux built-in command-line tools how quickly map one integer to another?

I have a text file mapping of two integers, separated by commas:
123,456
789,555
...
It's 120Megs... so it's a very long file.
I keep to search for the first column and return the second, e.g., look up 789 --returns--> 555 and I need to do it FAST, using regular Linux built-ins.
I'm doing this right now and it takes several seconds per look-up.
If I had a database I could index it. I guess I need an indexed text file!
Here is what I'm doing now:
my $lineFound=`awk -F, '/$COLUMN1/ { print $2 }' ../MyBigMappingFile.csv`;
Is there any easy way to pull this off with a performance improvement?
The hash suggestions are the natural way an experienced Perler would do this, but it may be suboptimal in this case. It scans the entire file and builds a large, flat datastructure in linear time. Cruder methods can short circuit with a worst case linear time, usually less in practice.
I first made a big mapping file:
my $LEN = shift;
for (1 .. $LEN) {
my $rnd = int rand( 999 );
print "$_,$rnd\n";
}
With $LEN passed on the command line as 10000000, the file came out to 113MB. Then I benchmarked three implemntations. The first is the hash lookup method. The second slurps the file and scans it with a regex. The third reads line-by-line and stops when it matches. Complete implementation:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw{timethese};
my $FILE = shift;
my $COUNT = 100;
my $ENTRY = 40;
slurp(); # Initial file slurp, to get it into the hard drive cache
timethese( $COUNT, {
'hash' => sub { hash_lookup( $ENTRY ) },
'scalar' => sub { scalar_lookup( $ENTRY ) },
'linebyline' => sub { line_lookup( $ENTRY ) },
});
sub slurp
{
open( my $fh, '<', $FILE ) or die "Can't open $FILE: $!\n";
undef $/;
my $s = <$fh>;
close $fh;
return $s;
}
sub hash_lookup
{
my ($entry) = #_;
my %data;
open( my $fh, '<', $FILE ) or die "Can't open $FILE: $!\n";
while( <$fh> ) {
my ($name, $val) = split /,/;
$data{$name} = $val;
}
close $fh;
return $data{$entry};
}
sub scalar_lookup
{
my ($entry) = #_;
my $data = slurp();
my ($val) = $data =~ /\A $entry , (\d+) \z/x;
return $val;
}
sub line_lookup
{
my ($entry) = #_;
my $found;
open( my $fh, '<', $FILE ) or die "Can't open $FILE: $!\n";
while( <$fh> ) {
my ($name, $val) = split /,/;
if( $name == $entry ) {
$found = $val;
last;
}
}
close $fh;
return $found;
}
Results on my system:
Benchmark: timing 100 iterations of hash, linebyline, scalar...
hash: 47 wallclock secs (18.86 usr + 27.88 sys = 46.74 CPU) # 2.14/s (n=100)
linebyline: 47 wallclock secs (18.86 usr + 27.80 sys = 46.66 CPU) # 2.14/s (n=100)
scalar: 42 wallclock secs (16.80 usr + 24.37 sys = 41.17 CPU) # 2.43/s (n=100)
(Note I'm running this off an SSD, so I/O is very fast, and perhaps makes that initial slurp() unnecessary. YMMV.)
Interestingly, the hash implementation is just as fast as linebyline, which isn't what I expected. By using slurping, scalar may end up being faster on a traditional hard drive.
However, by far the fastest is a simple call to grep:
$ time grep '^40,' int_map.txt
40,795
real 0m0.508s
user 0m0.374s
sys 0m0.046
Perl could easily read that output and split apart the comma in hardly any time at all.
Edit: Never mind about grep. I misread the numbers.
120 meg isn't that big. Assuming you've got at least 512MB of ram, you could easily read the whole file into a hash and then do all of your lookups against that.
use:
sed -n "/^$COLUMN1/{s/.*,//p;q}" file
This optimizes your code in three ways:
1) No needless splitting each line in two on ",".
2) You stop processing the file after the first hit.
3) sed is faster than awk.
This should more than half your search time.
HTH Chris
It all depends on how often the data change and how often in the course of a single script invocation you need to look up.
If there are many lookups during each script invocation, I would recommend parsing the file into a hash (or array if the range of keys is narrow enough).
If the file changes every day, creating a new SQLite database might or might not be worth your time.
If each script invocation needs to look up just one key, and if the data file changes often, you might get an improvement by slurping the entire file into a scalar (minimizing memory overhead, and do a pattern match on that (instead of parsing each line).
#!/usr/bin/env perl
use warnings; use strict;
die "Need key\n" unless #ARGV;
my $lookup_file = 'lookup.txt';
my ($key) = #ARGV;
my $re = qr/^$key,([0-9]+)$/m;
open my $input, '<', $lookup_file
or die "Cannot open '$lookup_file': $!";
my $buffer = do { local $/; <$input> };
close $input;
if (my ($val) = ($buffer =~ $re)) {
print "$key => $val\n";
}
else {
print "$key not found\n";
}
On my old slow laptop, with a key towards the end of the file:
C:\Temp> dir lookup.txt
...
2011/10/14 10:05 AM 135,436,073 lookup.txt
C:\Temp> tail lookup.txt
4522701,5840
5439981,16075
7367284,649
8417130,14090
438297,20820
3567548,23410
2014461,10795
9640262,21171
5345399,31041
C:\Temp> timethis lookup.pl 5345399
5345399 => 31041
TimeThis : Elapsed Time : 00:00:03.343
This example loads the file into a hash (which takes about 20s for 120M on my system). Subsequent lookups are then nearly instantaneous. This assumes that each number in the left column is unique. If that's not the case then you would need to push numbers on the right with the same number on the left onto an array or something.
use strict;
use warnings;
my ($csv) = #ARGV;
my $start=time;
open(my $fh, $csv) or die("$csv: $!");
$|=1;
print("loading $csv... ");
my %numHash;
my $p=0;
while(<$fh>) { $p+=length; my($k,$v)=split(/,/); $numHash{$k}=$v }
print("\nprocessed $p bytes in ",time()-$start, " seconds\n");
while(1) { print("\nEnter number: "); chomp(my $i=<STDIN>); print($numHash{$i}) }
Example usage and output:
$ ./lookup.pl MyBigMappingFile.csv
loading MyBigMappingFile.csv...
processed 125829128 bytes in 19 seconds
Enter number: 123
322
Enter number: 456
93
Enter number:
does it help if you cp the file to your /dev/shm, and using /awk/sed/perl/grep/ack/whatever query a mapping?
don't tell me you are working on a 128MB ram machine. :)

Perl: Removing characters up to certain point.

I've tried searching through questions already asked, but can't seem to find anything. I'm sure its incredibly simple to do, but I am completely new to Perl.
What I am trying to do is remove characters in an string up to a certain point. For example, I have:
Parameter1 : 0xFFFF
and what I would like to do is remove the "Parameter1:" and be left with just the "0xFFFF". If anyone can help and give a simple explanation of the operators used, that'd be great.
Sounds like you need the substr function.
#!/usr/bin/perl
use strict;
use warnings;
my $string = 'Parameter1 : 0xFFFF';
my $fragment = substr $string, 12;
print " string: <$string>\n";
print "fragment: <$fragment>\n";
s/.*:\s*//;
or
$s =~ s/.*:\s*//;
This deletes everything up to and including the first occurrence of : followed by zero or more whitespace characters. With $s =~ it's applied to $s; without it, it's applied to $_.
Have you considered using something like Config::Std?
Here is how to parse a configuration file like that by hand:
#!/usr/bin/perl
use strict; use warnings;
my %params;
while ( my $line = <DATA> ) {
if ($line =~ m{
^
(?<param> Parameter[0-9]+)
\s*? : \s*?
(?<value> 0x[[:xdigit:]]+)
}x ) {
$params{ $+{param} } = $+{value};
}
}
use YAML;
print Dump \%params;
__DATA__
Parameter1 : 0xFFFF
Parameter3 : 0xFAFF
Parameter4 : 0xCAFE
With Config::Std:
#!/usr/bin/perl
use strict; use warnings;
use Config::Std;
my $config = do { local $/; <DATA> };
read_config \$config, my %params;
use YAML;
print Dump \%params;
__DATA__
Parameter1 : 0xFFFF
Parameter3 : 0xFAFF
Parameter4 : 0xCAFE
Of course, in real life, you'd pass a file name to read_config instead of slurping it.
I like split for these parameter/value pairs.
my $str = "Parameter1 : 0xFFFF";
my ($param, $value) = split /\s*:\s*/, $str, 2;
Note the use of LIMIT in the split, which limits the split to two fields (in case of additional colons in the value).

Convert Memory Size (Human readable) into Actual Number (bytes) in Perl

Is there an actual package in CPAN to convert such string:
my $string = "54.4M"
my $string2 = "3.2G"
into the actual number in bytes:
54,400,000
3,200,000,000
And vice versa.
In principle what I want to do at the end is to sum out all the memory size.
To get the exact output you asked for, use Number::FormatEng and Number::Format:
use strict;
use warnings;
use Number::FormatEng qw(:all);
use Number::Format qw(:subs);
my $string = "54.4M" ;
my $string2 = "3.2G" ;
print format_number(unformat_pref($string)) , "\n";
print format_number(unformat_pref($string2)) , "\n";
__END__
54,400,000
3,200,000,000
By the way, only unformat_pref is needed if you are going to perform calculations with the result.
Since Number::FormatEng was intended for engineering notation conversion (not for bytes), its prefix is case-sensitive. If you want to use it for kilobytes, you must use lower case k.
Number::Format will convert these strings into actual bytes (kinda, almost).
use Number::Format qw(:subs);
my $string = "54.4M" ;
my $string2 = "3.2G" ;
print round(unformat_number($string) , 0), "\n";
print round(unformat_number($string2), 0), "\n";
__END__
57042534
3435973837
The reason I said "kinda, almost" is that Number::Format treats 1K as being equal to 1024 bytes, not 1000 bytes. That's probably why it gives a weird-looking result (with fractional bytes), unless it is rounded.
For your first problem, I did not find a CPAN package, but this code snippet might do:
sub convert_human_size {
my $size = shift;
my #suffixes = ('', qw(k m g));
for my $index (0..$#suffixes) {
my $suffix = $suffixes[$index];
if ( $size =~ /^([\d.]+)$suffix\z/i ) {
return int($1 * (1024 ** $index));
}
}
# No match
die "Didn't understand human-readable file size '$size'"; # or croak
}
Wrap the number through Number::Format's format_number function if you'd like pretty semi-colons (e.g. "5,124" instead of "5124")
CPAN solves the second part of your problem:
Number::Bytes::Human
For example:
use Number::Bytes::Human qw(format_bytes);
$size = format_bytes(54_400_000);
You may provide an optional bs => 1000 parameter to change the base of the conversion to 1000 instead of 1024.
This should get you started. You could add other factors, like kilobytes ("K") on your own, as well as formatting of output (comma separators, for example):
#!/usr/bin/perl -w
use strict;
use POSIX qw(floor);
my $string = "54.4M";
if ( $string =~ m/(\d+)?.(\d+)([M|G])/ ) {
my $mantissa = "$1.$2";
if ( $3 eq "M" ) {
$mantissa *= (2 ** 20);
}
elsif ( $3 eq "G" ) {
$mantissa *= (2 ** 30);
}
print "$string = ".floor($mantissa)." bytes\n";
}
Output:
54.4M = 57042534 bytes
Basically, to go from strings to numbers, all you need is a hash mapping units to multipliers:
#!/usr/bin/perl
use strict; use warnings;
my $base = 1000;
my %units = (
K => $base,
M => $base ** 2,
G => $base ** 3,
# etc
);
my #strings = qw( 54.4M 3.2G 1K 0.1M .);
my $pattern = join('|', sort keys %units);
my $total;
for my $string ( #strings ) {
while ( $string =~ /(([0-9]*(?:\.[0-9]+)?)($pattern))/g ) {
my $number = $2 * $units{$3};
$total += $number;
printf "%12s = %12.0f\n", $1, $number;;
}
}
printf "Total %.0f bytes\n", $total;
Output:
54.4M = 54400000
3.2G = 3200000000
1K = 1000
0.1M = 100000
Total 3254501000 bytes

Resources