Howto insert awk command in perl script? - linux

I want to add this awk command on my script but keep getting error. I have put inside " " but still getting error.
system("awk -F"\t" '{ for ( i=1; i<=2; i++ ) { printf "%s\t", $i } printf "\n"; }' myfile file2"};
the errors are
String found where operator expected
at host_parse line 21, near "t" '{ for
( i=1; i<=2; i++ ) { printf ""
Unquoted string "a" may clash with
future reserved word at myfile line
58.
Unquoted string "a" may clash with
future reserved word at myfile line
58.
syntax error at myfile line 21, near
"" awk -F"\"
Thanks.

One of the trickiest parts about using the system command is using quotes in a way that will get the correct command passed to the operating system. Perl's q// construction can be very helpful for this:
# treat everything between the #...# as uninterpolated string
system( q#awk -F"\t" '{ for ( i=1; i<=2; i++ ) { printf "%s\t", $i }
printf "\n"; }' myfile file2# );

To answer your immediate question, you're tripping over the default behavior of Perl's system operator. Usually, it's a great convenience for the shell to parse the command, but sometimes, as as you've seen, having multiple levels of encoding is a pain—or even a security vulnerability.
You can bypass the shell's quoting entirely with the system LIST and exec LIST forms. In your case, change your code to
#! /usr/bin/env perl
use strict;
use warnings;
my #cmd = (
"awk",
"-F", "\t",
'{ for ( i=1; i<=2; i++ ) {
printf "%s\t", $i
}
printf "\n";
}',
"myfile", "file2",
);
system(#cmd) == 0 or warn "$0: awk exited " . ($? >> 8);
You don't have to use the temporary array, but I don't like the resulting code with a multi-line command and a check for success.
Given myfile containing
1 2 3 4
foo bar baz
oui oui monsieur
and file2 with
a b c
d e f g
(where the separators in both cases are TAB characters), then the output is
1 2
foo bar
oui oui
a b
d e
They're invisible, but each line of output above has a trailing TAB.
Doing the same in Perl is straightforward. For example,
sub print_first_two_columns {
foreach my $path (#_) {
open my $fh, "<", $path or die "$0: open $path: $!";
while (<$fh>) {
chomp;
my(#cols) = (split /\t/)[0 .. 1];
print join("\t", #cols), "\n";
}
close $fh;
}
}
The part that may not be obvious is taking a slice of the values returned from split, but what's happening is simple in concept. A slice allows you to grab data at multiple indices (0 and 1 in this case, i.e., the first and second columns). The range-operator expression 0 .. 1 evaluates to the list 0 and 1. If you decide later you want the first four columns, you'd change it to 0 .. 3.
Call the sub above as in
print_first_two_columns "myfile", "file2";
Note that the code isn't exactly equivalent: it doesn't preserve the trailing TAB characters.
From the command line, it's even simpler:
$ perl -lane '$,="\t"; print #F[0,1]' myfile file2
1 2
foo bar
oui oui
a b
d e

You don't need the shell to interpret any redirection (or other shell facilities), so it would be better to pass a list of arguments to system()
system 'awk', '-F', "\t",
'{for (i=1; i<=2; i++) {printf "%s\t", $i}; print ""}',
'myfile', 'file2';

Related

separate columns of a text file

Hii experts i have a big text file that contain many columns.Now i want to extract each column in separate text file serially with adding two strings on the top.
suppose i have a input file like this
2 3 4 5 6
3 4 5 6 7
2 3 4 5 6
1 2 2 2 2
then i need to extract each column in separate text file with two strings on the top
file1.txt file2.txt .... filen.txt
s=5 s=5
r=9 r=9
2 3
3 4
2 3
1 2
i tried script as below:but it doesnot work properly.need help from experts.Thanks in advance.
#!/bin/sh
for i in $(seq 1 1 5)
do
echo $i
awk '{print $i}' inp_file > file_$i
done
Could you please try following, written and tested with shown samples in GNU awk. Following doesn't have close file function used because your sample shows you have only 5 columns in Input_file. Also created 2 awk variables which will be printed before actual column values are getting printed to output file(named var1 and var2).
awk -v var1="s=5" -v var2="r=9" '
{
count++
for(i=1;i<=NF;i++){
outputFile="file"i".txt"
if(count==1){
print (var1 ORS var2) > (outputFile)
}
print $i > (outputFile)
}
}
' Input_file
In case you can have more than 5 or more columns then better close output files kin backend using close option, use this then(to avoid error too many files opened).
awk -v var1="s=5" -v var2="r=9" '
{
count++
for(i=1;i<=NF;i++){
outputFile="file"i".txt"
if(count==1){
print (var1 ORS var2) > (outputFile)
}
print $i >> (outputFile)
}
close(outputFile)
}
' Input_file
Pretty simple to do in one pass through the file with awk using its output redirection:
awk 'NR==1 { for (n = 1; n <= NF; n++) print "s=5\nr=9" > ("file_" n) }
{ for (n = 1; n <= NF; n++) print $n > ("file_" n) }' inp_file
With GNU awk to internally handle more than a dozen or so simultaneously open files:
NR == 1 {
for (i=1; i<=NF; i++) {
out[i] = "file" i ".txt"
print "s=5" ORS "r=9" > out[i]
}
}
{
for (i=1; i<=NF; i++) {
print $i > out[i]
}
}
or with any awk just close them as you go:
NR == 1 {
for (i=1; i<=NF; i++) {
out[i] = "file" i ".txt"
print "s=5" ORS "r=9" > out[i]
close(out[i])
}
}
{
for (i=1; i<=NF; i++) {
print $i >> out[i]
close(out[i])
}
}
split -nr/$(wc -w <(head -1 input) | cut -d' ' -f1) -t' ' --additional-suffix=".txt" -a4 --numeric-suffix=1 --filter "cat <(echo -e 's=5 r=9') - | tr ' ' '\n' >\$FILE" <(tr -s '\n' ' ' <input) file
This uses the nifty split command in a unique way to rearrange the columns. Hopefully it's faster than awk, although after spending a considerable amount of time coding it, testing it, and writing it up, I find that it may not be scalable enough for you since it requires a process per column, and many systems are limited in user processes (check ulimit -u). I submit it though because it may have some limited learning usefulness, to you or to a reader down the line.
Decoding:
split -- Divide a file up into subfiles. Normally this is by lines or by size but we're tweaking it to use columns.
-nr/$(...) -- Use round-robin output: Sort records (in our case, matrix cells) into the appropriate number of bins in a round-robin fashion. This is the key to making this work. The part in parens means, count (wc) the number of words (-w) in the first line (<(head -1 input)) of the input and discard the filename (cut -d' ' -f1), and insert the output into the command line.
-t' ' -- Use a single space as a record delimiter. This breaks the matrix cells into records for split to split on.
--additional-suffix=".txt" -- Append .txt to output files.
-a4 -- Use four-digit numbers; you probably won't get 1,000 files out of it but just in case ...
--numeric-suffix=1 -- Add a numeric suffix (normally it's a letter combination) and start at 1. This is pretty pedantic but it matches the example. If you have more than 100 columns, you will need to add a -a4 option or whatever length you need.
--filter ... -- Pipe each file through a shell command.
Shell command:
cat -- Concatenate the next two arguments.
<(echo -e 's=5 r=9') -- This means execute the echo command and use its output as the input to cat. We use a space instead of a newline to separate because we're converting spaces to newlines eventually and it is shorter and clearer to read.
- -- Read standard input as an argument to cat -- this is the binned data.
| tr ' ' '\n' -- Convert spaces between records to newlines, per the desired output example.
>\$FILE -- Write to the output file, which is stored in $FILE (but we have to quote it so the shell doesn't interpret it in the initial command).
Shell command over -- rest of split arguments:
<(tr -s '\n' ' ' < input) -- Use, as input to split, the example input file but convert newlines to spaces because we don't need them and we need a consistent record separator. The -s means only output one space between each record (just in case we got multiple ones on input).
file -- This is the prefix to the output filenames. The output in my example would be file0001.txt, file0002.txt, ..., file0005.txt.

remove lines from text file that contain specific text

I'm trying to remove lines that contain 0/0 or ./. in column 71 "FORMAT.1.GT" from a tab delimited text file.
I've tried the following code but it doesn't work. What is the correct way of accomplishing this? Thank you
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";
You can either call a one-liner as borodin and zdim said. Which one is right for you is still not clear because you don't tell whether 71st column means the 71st tab-separated field of a line or the 71st character of that line. Consider
12345\t6789
Now what is the 2nd column? Is it the character 2 or the field 6789? Borodin's answer assumes it's 6789 while zdim assumes it's 2. Both showed a solution for either case but these solutions are stand-alone solutions. Programs of its own to be run from the commandline.
If you want to integrate that into your Perl script you could do it like this:
Replace this line:
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";
with this snippet:
open( my $fh_in, '<', $Variantlinestsvfile ) or die "cannot open $Variantlinestsvfile: $!\n";
open( my $fh_out, '>', $MDLtsvfile ) or die "cannot open $MDLtsvfile: $!\n";
while( my $line = <$fh_in> ) {
# character-based:
print $fh_out $line unless (substr($line, 70, 3) =~ m{(?:0/0|\./\.)});
# tab/field-based:
my #fields = split(/\s+/, $line);
print $fh_out $line unless ($fields[70] =~ m|([0.])/\1|);
}
close($fh_in);
close($fh_out);
Use either the character-based line or the tab/field-based lines. Not both!
Borodin and zdim condensed this snippet to a one-liner, but you must not call that from a Perl script.
Since you need the exact position and know string lenghts substr can find it
perl -ne 'print if not substr($_, 70, 3) =~ m{(?:0/0|\./\.)}' filename
This prints lines only when a three-character long string starting at 71st column does not match either of 0/0 and ./.
The {} delimiters around the regex allow us to use / and | inside without escaping. The ?: is there so that the () are used only for grouping, and not capturing. It will work fine also without ?: which is there only for efficiency's sake.
perl -ane 'print unless $F[70] =~ m|([0.])/\1|' myfile > newfile
The problem with your command is that you are attempting to capture the output of a command which produces no output - all the matches are redirected to a file, so that's where all the output is going.
Anyway, calling grep from Perl is just wacky. Reading the file in Perl itself is the way to go.
If you do want a single shell command,
grep -Ev $'^([^\t]*\t){70}(\./\.|0/0)\t' file
would do what you are asking more precisely and elegantly. But you can use that regex straight off in your Perl program just as well.
Try it!
awk '{ if ($71 != "./." && $71 != ".0.") print ; }' old_file.txt > new_file.txt

Transliteration script for linux shell

I have multiple .txt files containing text in an alphabet; I want to transliterate the text into an other alphabet; some characters of alphabet1 are 1:1 with those of alphabet2 (i.e. a becomes e), whereas others are 1:2 (i.e. x becomes ch).
I would like to do this using a simple script for the Linux shell.
With tr or sed I can convert 1:1 characters:
sed -f y/abcdefghijklmnopqrstuvwxyz/nopqrstuvwxyzabcdefghijklm/
a will become n, b will become o et cetera (a Caesar's cipher, I think)
But how can I deal with 1:2 characters?
Not an answer, just to show a briefer, idiomatic way to populate the table[] array from #konsolebox's answer as discussed in the related comments:
BEGIN {
split("a e b", old)
split("x ch o", new)
for (i in old)
table[old[i]] = new[i]
FS = OFS = ""
}
so the mapping of old to new chars is clearly shown in that the char in the first split() is mapped to the char(s) below it and for any other mapping you want you just need to change the string(s) in the split(), not change 26-ish explicit assignments to table[].
You can even create a general script to do mappings and just pass in the old and new strings as variables:
BEGIN {
split(o, old)
split(n, new)
for (i in old)
table[old[i]] = new[i]
FS = OFS = ""
}
then in shell anything like this:
old="a e b"
new="x ch o"
awk -v o="$old" -v b="$new" -f script.awk file
and you can protect yourself from your own mistakes populating the strings, e.g.:
BEGIN {
numOld = split(o, old)
numNew = split(n, new)
if (numOld != numNew) {
printf "ERROR: #old vals (%d) != #new vals (%d)\n", numOld, numNew | "cat>&1"
exit 1
}
for (i=1; i <= numOld; i++) {
if (old[i] in table) {
printf "ERROR: \"%s\" duplicated at position %d in old string\n", old[i], i | "cat>&2"
exit 1
}
if (newvals[new[i]]++) {
printf "WARNING: \"%s\" duplicated at position %d in new string\n", new[i], i | "cat>&2"
}
table[old[i]] = new[i]
}
}
Wouldn't it be good to know if you wrote that b maps to x and then later mistakenly wrote that b maps to y? The above really is the best way to do this but your call of course.
Here's one complete solution as discussed in the comments below
BEGIN {
numOld = split("a e b", old)
numNew = split("x ch o", new)
if (numOld != numNew) {
printf "ERROR: #old vals (%d) != #new vals (%d)\n", numOld, numNew | "cat>&1"
exit 1
}
for (i=1; i <= numOld; i++) {
if (old[i] in table) {
printf "ERROR: \"%s\" duplicated at position %d in old string\n", old[i], i | "cat>&2"
exit 1
}
if (newvals[new[i]]++) {
printf "WARNING: \"%s\" duplicated at position %d in new string\n", new[i], i | "cat>&2"
}
map[old[i]] = new[i]
}
FS = OFS = ""
}
{
for (i = 1; i <= NF; ++i) {
if ($i in map) {
$i = map[$i]
}
}
print
}
I renamed the table array as map just because iMHO that better represents the purpose of the array.
save the above in a file script.awk and run it as awk -f script.awk inputfile
Using Awk:
#!/usr/bin/awk -f
BEGIN {
FS = OFS = ""
table["a"] = "e"
table["x"] = "ch"
# and so on...
}
{
for (i = 1; i <= NF; ++i) {
if ($i in table) {
$i = table[$i]
}
}
}
1
Usage:
awk -f script.awk file
Test:
# echo "the quick brown fox jumps over the lazy dog" | awk -f script.awk
the quick brown foch jumps over the lezy dog
This can be done quite concisely using a Perl one-liner:
perl -pe '%h=(a=>"xy",c=>"z"); s/(.)/defined $h{$1} ? $h{$1} : $1/eg'
or equivalently (thanks jaypal):
perl -pe '%h=(a=>"xy",c=>"z"); s|(.)|$h{$1}//=$1|eg'
%h is a hash containing the characters (keys) and their substitutions (values). s is the substitution command (as in sed). The g modifier means that the substitution is global and the e means that the replacement part is evaluated as an expression. It captures each character one by one and substitutes them with the value in the hash if it exists, otherwise keeps the original value. The -p switch means that each line in the input is automatically printed.
Testing it out:
$ perl -pe '%h=(a=>"xy",c=>"z"); s|(.)|$h{$1}//=$1|eg' <<<"abc"
xybz
Using sed.
Write a file transliterate.sed containing:
s/a/e/g
s/x/ch/g
and then run from your command line to get the transliterated output.txt from input.txt:
sed -f transliterate.sed input.txt > output.txt
If you need this more often consider adding #!/bin/sed -f as first line and making your file executable with chmod 744 transliterate.sed as described at the Wikipedia page for sed.

Search book by title/author

Hi i am new to shell programming i hope u can guide me along thanks.
Hi i need a function to search either book title or author from a .txt file and echo out the following
Title:*Enter*
Author:Scissorhands
Found 3 records :
C++ for dummies, John Scissorhands, $15.01, 10, 5
Java for dummies, Mary Scissorhands, $16.02, 20, 15
VB.NET for dummies, Edward Scissorhands, $17.03, 30, 25
eg
Found 3 records :
C++ for dummies, John Scissorhands, $15.01, 10, 5
Java for dummies, Mary Scissorhands, $16.02, 20, 15
VB.NET for dummies, Edward Scissorhands, $17.03, 30, 25
my file format is as below
Book name:author:price:Qty:Qty Sold
harry potter:james:12.99:197:101
function Search_book
{
FILE="/home/student/Downloads/BookDB.txt"
echo found 1 records : $FILE contents
cat FILE
}
Edit: Sorry, I misunderstood your question.
Updated mixed bash/perl solution:
#!/bin/bash
read -p "Enter search term: " search
perl -ne '
BEGIN{ $pattern = $ARGV[0]; shift; $n=0 }
#a=split /:/;
if ($a[0] =~ m/$pattern/i or $a[1] =~ m/$pattern/i) {
print "$a[0], $a[1],\$$a[2],$a[3],$a[4]\n";
$n += 1;
}
END{ print "Found $n title(s).\n" }
' "$search" /home/student/Downloads/BookDB.txt
Output:
$ ./search.sh
Enter search term: star wars vi
Title: Star wars VI - return of the jedi
Found 1 title(s).
Not that you need regular expressions for wildcards (i.e. . instead of ?, .* instead of *, etc.), and you don't need to put wildcards at beginning/end of the search term to find matches anywhere in a given (sub)string.
Of course you could also do this entirely in Perl, without the shell script wrapper:
#!/usr/bin/env perl
use strict;
use warnings;
my $booklist = './books.txt';
my #book;
print "Enter search term: ";
chomp (my $pattern = <>);
open BOOKS, "<$booklist" or die $!;
my $n = 0;
foreach (<BOOKS>) {
chomp;
#book = split /:/;
if ($book[0] =~ m/$pattern/i or $book[1] =~ m/$pattern/i) {
print "$book[0], $book[1],\$$book[2],$book[3],$book[4]\n";
$n += 1;
}
}
close BOOKS;
print "Found $n title(s).\n";
For an awk solution see Adrian Frühwirth's answer.
search_book()
{
awk -F':' -v search="$1" '$1 ~ search || $2 ~ search { i++; printf "%s, %s,$%s,%s,%s\n", $1, $2, $3, $4, $5 } END { printf "%d records found\n", i }' books.txt
}
_
$ cat books.txt
X never marks the spot:Indiana Jones:9.99:1:1
A fistful of barnacles:Captain Twiddlymore:9.99:2:1
The time I blew up LeChuck:Guybrush Threepwood:8.99:100:60
When I blew up LeChuck:Guybrush Threepwood:8.99:100:50
Where I blew up LeChuck:Guybrush Threepwood:8.99:100:2
_
$ search_book Indiana
X never marks the spot, Indiana Jones,$9.99,1,1
1 records found
$ search_book Guybrush
The time I blew up LeChuck, Guybrush Threepwood,$8.99,100,60
When I blew up LeChuck, Guybrush Threepwood,$8.99,100,50
Where I blew up LeChuck, Guybrush Threepwood,$8.99,100,2
3 records found
$ search_book barnacle
A fistful of barnacles, Captain Twiddlymore,$9.99,2,1
1 records found
$ search_book foo
0 records found
# Wilson Turners last point on Adrian's post:
./book.sh
Enter search term: barnacle
A fistful of barnacles, Captain Twiddlymore,$9.99,2,1
1 records found
Here is how you call the awk function within your existing code:
#!/bin/bash
search_book()
{
awk -F':' -v search="$search" '$1 ~ search || $2 ~ search { i++; printf "%s, %s,$%s,%s,%s\n", $1, $2, $3, $4, $5 } END { printf "%d records found\n", i }' books.txt
}
read -p "Enter search term: " search
search_book

How to divide a data file's column A by column B using Perl

I was given a text file with a whole bunch of data sorted in columns. Each of the columns are
separated by commas.
How could I divide a column by another column to print an output answer? I am using Perl right now now so it has to be done in Perl. How could I do this?
This is what I have so far:
#!/usr/bin/perl
open (FILE, 'census2008.txt');
while (<FILE>) {
chomp;
($sumlev, $stname,$ctyname,$popestimate2008,$births2008,$deaths2008) = split(",");
}
close (FILE);
exit;
There are several options:
Read the file in line by line, split the columns on ',' and divide the relevant columns (don't forget to handle the divide-by-zero error)
Do the same thing as a one-liner:
$ perl -F/,/ -lane 'print( $F[1] == 0 ? "" : $F[3]/$F[1] )' file.txt
Utilize a ready-to-use CPAN module like Text::CSV
Of course, there are more unorthodox/crazy/unspeakable alternatives à la TMTOWTDI ™, so one could:
Parse out the relevant columns with a regex and divide the matches:
if (/^\d*,(\d+),\d*,(\d+)/) { say $2/$1 if $2 != 0; }
Do it with s///e:
$ perl -ple 's!^\d*,(\d+),\d*,(\d+).*$! $2 == 0 ? "" : $2/$1 !e' file.txt;
Get the shell to do the dirty work via backticks:
sub print_divide { say `cat file.txt | some_command_line_command` }
#!/usr/bin/env perl
# divides column 1 by column 2 of some ','-delimited file,
# read from standard input.
# usage:
# $ cat data.txt | 8458760.pl
while (<STDIN>) {
#values = split(/,/, $_);
print $values[0] / $values[1] . "\n";
}
If you have fixed width columns of data you could use 'unpack' along the lines of:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ($sumlev,$stname,$ctyname,$popest,$births,$deaths)
= unpack("A2xA10xA15xA7xA5xA5");
printf "%-15s %4.2f\n", $ctyname, $births/$deaths;
}
__DATA__
10,Main ,My City , 10000, 200, 150
12,Poplar ,Somewhere , 3000, 90, 100
13,Maple ,Your Place , 9123, 100, 90

Resources