insert delimiter in file using Linux script [closed] - linux

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a non delimited text file consisting of around 1 million rows.
Sample rows
1YBL LOYALTY EXT 1000101172019001
2000100101000011512753184907301010614199100919699034659 VIDYA.SAGAR1#bank.IN VIDYA SAGAR CROSS BANDRA WM DELHI 456471
3000000027
On each row starting with digit "2","1","3"(rowtype) I have to insert delimiter based on the count of characters i.e on the end 0-1, 1-20,21-25... so on
How to do this using Linux script ?
Desired Output
1|YBL LOYALTY EXT |10001|01172019|001
2|00010010100001151|2753|184907301010614199100919699034659 |VIDYA.SAGAR1#bank.IN |VIDYA SAGAR |CROSS |BANDRA |WM |DELHI |456471
3|000000027
I tried this command
perl -ne ' if(/^2/) { #x=(1,19,6,4,3,8,20,60,40,40,40,40,30); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_"} if(/^1/) { #x=(1,16,5,8); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" } if(/^3/) { #x=(1); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" }' filename`
INPUT ROWS
1YBL LOYALTY EXT 1000112102018001
2000100101000002631653184911501010111199100919323739251 VIJAYPANDEY1191#GMAIL.COM VIJAY PANDEY PART OF GROUND FLOOR & BASEMENT SHOPPER STOP SV ROAD ANDHERI WEST LANDMARK-ERSTWHILE CRASSWORD BOOK STORE MUMBAI 400058
2000100101000019920453184964321010513199000919878857482 MAKSUDMASTER7775#GMAIL.COM MOHAMAD MAQSHUD MASTER H COLLECTION NEW SHIVPURI GALI NO 1 NEAR MAKHAN SINGH CHOWK LUDHIANA 141008
2000100101000023500853184923441010913197300919375580888 JAYNTITALA#GMAIL.COM JAYANTIBHAI TADA 44 KHODIYAR NAGAR B S ABHISHEK SUDAMA CHOWK KHODIYARNAGAR MOTA VARACHHA SURAT 395006
3000000066
EXPECTED OUTPUT
1|YBL LOYALTY EXT |10001|12102018|001
2|0001001010000026316|531849|1150|101|01111991|00919323739251 |VIJAYPANDEY1191#GMAIL.COM |VIJAY PANDEY |PART OF GROUND FLOOR & BASEMENT |SHOPPER STOP SV ROAD ANDHERI WEST |LANDMARK-ERSTWHILE CRASSWORD BOOK STORE |MUMBAI |400058
2|0001001010000199204|531849|6432|101|05131990|00919878857482 |MAKSUDMASTER7775#GMAIL.COM |MOHAMAD MAQSHUD MASTER |H COLLECTION NEW SHIVPURI |GALI NO 1 |NEAR MAKHAN SINGH CHOWK |LUDHIANA |141008
2|0001001010000235008|531849|2344|101|09131973|00919375580888 |JAYNTITALA#GMAIL.COM |JAYANTIBHAI TADA |44 KHODIYAR NAGAR B S ABHISHEK |SUDAMA CHOWK |KHODIYARNAGAR MOTA VARACHHA |SURAT |395006
3|000000066
GETTING THIS BUT
1|YBL LOYALTY EXT |10001|12102018|001
2|0001001010000026316|531849|1150|101|01111991|00919323739251 |VIJAYPANDEY1191#GMAIL.COM |VIJAY PANDEY |PART OF GROUND FLOOR & BASEMENT |SHOPPER STOP SV ROAD ANDHERI WEST |LANDMARK-ERSTWHILE CRASSWORD BOOK STORE |MUMBAI |400058
2|0001001010000199204|531849|6432|101|05131990|00919878857482 |MAKSUDMASTER7775#GMAIL.COM |MOHAMAD MAQSHUD MASTER |H COLLECTION NEW SHIVPURI |GALI NO 1 |NEAR MAKHAN SINGH CHOWK |LUDHIANA |141008
1|41008|
2|0001001010000235008|531849|2344|101|09131973|00919375580888 |JAYNTITALA#GMAIL.COM |JAYANTIBHAI TADA |44 KHODIYAR NAGAR B S ABHISHEK |SUDAMA CHOWK |KHODIYARNAGAR MOTA VARACHHA |SURAT |395006
3|95006
3|000000066

With GNU awk for FIELDWIDTHS:
$ awk -v FIELDWIDTHS='1 17 4 *' -v OFS='|' '/^2/{$1=$1; gsub(/\s+/,"&"OFS)} 1' file
1YBL LOYALTY EXT 1000101172019001
2|00010010100001151|2753|184907301010614199100919699034659 |VIDYA.SAGAR1#bank.IN |VIDYA |SAGAR |CROSS |BANDRA |WM |DELHI |456471
3000000027
The above use of FIELDWIDTHS says the input should be treated as separated into 4 fields of width 1 char, 17 chars, 4 chars and then the rest.
When you assign a value to a field awk recompiles the record replacing the input field separators with the value of OFS so $1=$1 is causing |s to be inserted between each of the fields described by FIELDWIDTHS.
Once that's done there's still all the remaining space-separated text to get a field separator added so the gsub() appends an OFS after every series of spaces.
Older versions of gawk don't support * as meaning the rest of the line - if you have that situation then just replace * with a large value like 99999.

You can try Perl as well
perl -lpe ' if(/^2/) { #x=(1,17,4);
for $i (#x) { s/(.{$i})//; printf("%s|",$1) } }' input_file
with the given inputs
$ cat rahman.txt
1YBL LOYALTY EXT 1000101172019001
2000100101000011512753184907301010614199100919699034659 VIDYA.SAGAR1#bank.IN VIDYA SAGAR CROSS BANDRA WM DELHI 456471
3000000027
$ perl -lpe ' if(/^2/) { #x=(1,17,4);
for $i (#x) { s/(.{$i})//; printf("%s|",$1) } }' rahman.txt
1YBL LOYALTY EXT 1000101172019001
2|00010010100001151|2753|184907301010614199100919699034659 VIDYA.SAGAR1#bank.IN VIDYA SAGAR CROSS BANDRA WM DELHI 456471
3000000027
$
just add entries to #x=(1,17,4) .. #x=(1,17,4,10,20)
EDIT1:
To add delimiters for those fields which can be split by space, use the below
$ perl -lpe ' if(/^2/) { #x=(1,17,4);
for $i (#x) { s/(.{$i})//; printf("%s|",$1) } s/\S+\s+\K/|/g }' rahman.txt
1YBL LOYALTY EXT 1000101172019001
2|00010010100001151|2753|184907301010614199100919699034659 |VIDYA.SAGAR1#bank.IN |VIDYA |SAGAR |CROSS |BANDRA |WM |DELHI |456471
3000000027
$
Explanation to the code
Explanation
perl -lpe # use -p for printing by default at the end of perl one-liner
# this makes sure when you dont have a line starting with 2 the line is printed after the if statement.
' if(/^2/) # if - select line that starts with 2. $_ will have the current line
{
#x=(1,17,4); # x is an array to hold the widths of fields. - 1, 17, 4
for $i (#x) # open for loop to loop through the array x
{
s/(.{$i})//; # no variable is specified, so the substitution acts on the $_ i.e current line
# first instance is s/(.{1})// => match one character and store it in $1 capturing variable
# replace the captured part with nothing and update $_
# e.g if the line is "200010010100001151" .. loop one will capture "2" and $_ becomes "00010010100001151"
# loop 2 => s/(.{17})// matches 17 character and $1 stores "00010010100001151"
printf("%s|",$1) # print $1 along with delimiter pipe
} # end of for loop
} # end of if
# here is default print statement in perl that will print the $_ after all modification
' input_file
EDIT2
I get below results based on your inputs. It works correctly.. what issues you see?
$ perl -ne ' if(/^2/) { #x=(1,19,6,4,3,8,20,60,40,40,40,40,30); $i=0;
> while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
> print "$_"} if(/^1/) { #x=(1,16,5,8); $i=0;
> while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
> print "$_" } if(/^3/) { #x=(1); $i=0;
> while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
> print "$_" }' rahman.txt
1|YBL LOYALTY EXT |10001|01172019|001
2|0001001010000115127|531849|0730|101|06141991|00919699034659 |VIDYA.SAGAR1#bank.IN VID|YA SAGAR CRO|SS BAN|DRA WM | DEL|HI 456|471
3|000000027
$
EDIT3:
Got the issue... $_ is modified and so at the end of /^2/ if loop, the $_ holds the value of "141008", which is then satisfying the next if (/^1/) condition and that if also executes.. To avoid it, just copy the $_ to a $line variable in the beginning and just check $line against /^2/, /^3/, /^1/ in the separate if loops.
$ perl -lne '$line=$_; if($line=~/^2/) { #x=(1,19,6,4,3,8,20,60,40,40,40,40,30); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" }
if($line=~/^1/) { #x=(1,16,5,8); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" }
if($line=~/^3/) { #x=(1); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" }' rahman2.txt
1|YBL LOYALTY EXT |10001|12102018|001
2|0001001010000026316|531849|1150|101|01111991|00919323739251 |VIJAYPANDEY1191#GMAIL.COM |VIJAY PANDEY |PART OF GROUND FLOOR & BASEMENT |SHOPPER STOP SV ROAD ANDHERI WEST |LANDMARK-ERSTWHILE CRASSWORD BOOK STORE |MUMBAI |400058
2|0001001010000199204|531849|6432|101|05131990|00919878857482 |MAKSUDMASTER7775#GMAIL.COM |MOHAMAD MAQSHUD MASTER |H COLLECTION NEW SHIVPURI |GALI NO 1 |NEAR MAKHAN SINGH CHOWK |LUDHIANA |141008
2|0001001010000235008|531849|2344|101|09131973|00919375580888 |JAYNTITALA#GMAIL.COM |JAYANTIBHAI TADA |44 KHODIYAR NAGAR B S ABHISHEK |SUDAMA CHOWK |KHODIYARNAGAR MOTA VARACHHA |SURAT |395006
3|000000066
$

You do have delimiters in your file, you just don't see them: it's the space/tab characters. So you just need to replace those, using the sed/xxx/|/g command (by xxx I mean the space or TAB characters). In case you doubt whether your characters are spaces or tabs, you might open your file in a hex editor (space is ASCII code 32 (Hex : 20) and TAB has 9 (Hex : 09)).

You can try with gnu sed :
sed -E '/^2/{s//&|/;s/(.{19})(....)(\S+\s+)/\1|\2|\3|/}' infile

In case you don't have FIELDSWIDTHS then try following.
awk -v var="1,18,4" -v OFS="|" '
BEGIN{
num=split(var,array,",")
}
{
for(i=1;i<=num;i++){
val=val?(i==num?val substr($0,array[i-1]+1,array[i]):val substr($0,array[i-1]+1,array[i]) OFS):substr($0,1,array[i]) OFS
sum+=array[i]
}
if(sum==length($0)){
print val
}
else{
rest=substr($0,sum)
gsub(/[[:space:]]+/,"&"OFS,rest)
print val,rest
}
sum=rest=val=""
}
' Input_file

Related

Parse columns with awk

I am new at AWK programming and I was wondering how to filter the following text:
Goedel - Declarative language for AI, based on many-sorted logic. Strongly
typed, polymorphic, declarative, with a module system. Supports bignums
and sets. "The Goedel Programming Language", P. M. Hill et al, MIT Press
1994, ISBN 0-262-08229-2. Goedel 1.4 - partial implementation in SICStus
Prolog 2.1.
ftp://ftp.cs.bris.ac.uk/goedel
info: goedel#compsci.bristol.ac.uk
Just to print this:
Goedel
I have used the following sentence but it just does not work as I wished:
awk -F " - " "/ - /{ print $1 }"
It shows the following:
Goedel
1994, ISBN 0-262-08229-2. Goedel 1.4
Could somebody tell me what I have to modify so I can get what I want?
Thanks in advance
awk 'BEGIN { RS = "" } { print $1 }' your_file.txt
which means: splits string into paragraphs by empty line, and then splits words by the default separator (space), and finally print the first word ($1) of every paragraph
this one-liner could work for your requirement:
awk -F ' - ' 'NF>1{print $1;exit}'
awk -F ' - ' ' { if (FNR % 4 == 1) next; print $1; }'
If the format is exactly the same as below, then the code above should work:
1 Author - ...
2 Year ...
3 URL
4 Extra info ...
5 Author - ...
6..N etc.
If there is a blank line between entries, you can set RS to a null string and $1 will be the author as long as the value for -F (the FS variable in an awk script) is the same. This has the advantage that if you don't have "info: ..." or a URL, you can still distinguish between entries, assuming it is not "Author - ...{newline}Year ...{newline}{newline}info: ...{newline}{newline}Author - ..." (you can't have an empty line between parts of an entry if an empty line is what separates entries.) For example:
# A blank line is what separates each entry.
BEGIN { RS = ""; }
{ print $1; }
If you have an awk that supports it, you can make RS a multiple character string if necessary (e.g. RS = "\n--\n" for entries separated by "--" on a line by itself). If you need a regex or simply don't have an awk that supports multiple character record separators, you're forced to use something like the following:
BEGIN { found_sep = 1; }
{ if (found_sep) { print $1; found_sep = 0; } }
# Entry separator is "--\n"
/^--$/ { found_sep = 1; }
More sample input will be required for something more complicated.

Search book by title/author

Hi i am new to shell programming i hope u can guide me along thanks.
Hi i need a function to search either book title or author from a .txt file and echo out the following
Title:*Enter*
Author:Scissorhands
Found 3 records :
C++ for dummies, John Scissorhands, $15.01, 10, 5
Java for dummies, Mary Scissorhands, $16.02, 20, 15
VB.NET for dummies, Edward Scissorhands, $17.03, 30, 25
eg
Found 3 records :
C++ for dummies, John Scissorhands, $15.01, 10, 5
Java for dummies, Mary Scissorhands, $16.02, 20, 15
VB.NET for dummies, Edward Scissorhands, $17.03, 30, 25
my file format is as below
Book name:author:price:Qty:Qty Sold
harry potter:james:12.99:197:101
function Search_book
{
FILE="/home/student/Downloads/BookDB.txt"
echo found 1 records : $FILE contents
cat FILE
}
Edit: Sorry, I misunderstood your question.
Updated mixed bash/perl solution:
#!/bin/bash
read -p "Enter search term: " search
perl -ne '
BEGIN{ $pattern = $ARGV[0]; shift; $n=0 }
#a=split /:/;
if ($a[0] =~ m/$pattern/i or $a[1] =~ m/$pattern/i) {
print "$a[0], $a[1],\$$a[2],$a[3],$a[4]\n";
$n += 1;
}
END{ print "Found $n title(s).\n" }
' "$search" /home/student/Downloads/BookDB.txt
Output:
$ ./search.sh
Enter search term: star wars vi
Title: Star wars VI - return of the jedi
Found 1 title(s).
Not that you need regular expressions for wildcards (i.e. . instead of ?, .* instead of *, etc.), and you don't need to put wildcards at beginning/end of the search term to find matches anywhere in a given (sub)string.
Of course you could also do this entirely in Perl, without the shell script wrapper:
#!/usr/bin/env perl
use strict;
use warnings;
my $booklist = './books.txt';
my #book;
print "Enter search term: ";
chomp (my $pattern = <>);
open BOOKS, "<$booklist" or die $!;
my $n = 0;
foreach (<BOOKS>) {
chomp;
#book = split /:/;
if ($book[0] =~ m/$pattern/i or $book[1] =~ m/$pattern/i) {
print "$book[0], $book[1],\$$book[2],$book[3],$book[4]\n";
$n += 1;
}
}
close BOOKS;
print "Found $n title(s).\n";
For an awk solution see Adrian Frühwirth's answer.
search_book()
{
awk -F':' -v search="$1" '$1 ~ search || $2 ~ search { i++; printf "%s, %s,$%s,%s,%s\n", $1, $2, $3, $4, $5 } END { printf "%d records found\n", i }' books.txt
}
_
$ cat books.txt
X never marks the spot:Indiana Jones:9.99:1:1
A fistful of barnacles:Captain Twiddlymore:9.99:2:1
The time I blew up LeChuck:Guybrush Threepwood:8.99:100:60
When I blew up LeChuck:Guybrush Threepwood:8.99:100:50
Where I blew up LeChuck:Guybrush Threepwood:8.99:100:2
_
$ search_book Indiana
X never marks the spot, Indiana Jones,$9.99,1,1
1 records found
$ search_book Guybrush
The time I blew up LeChuck, Guybrush Threepwood,$8.99,100,60
When I blew up LeChuck, Guybrush Threepwood,$8.99,100,50
Where I blew up LeChuck, Guybrush Threepwood,$8.99,100,2
3 records found
$ search_book barnacle
A fistful of barnacles, Captain Twiddlymore,$9.99,2,1
1 records found
$ search_book foo
0 records found
# Wilson Turners last point on Adrian's post:
./book.sh
Enter search term: barnacle
A fistful of barnacles, Captain Twiddlymore,$9.99,2,1
1 records found
Here is how you call the awk function within your existing code:
#!/bin/bash
search_book()
{
awk -F':' -v search="$search" '$1 ~ search || $2 ~ search { i++; printf "%s, %s,$%s,%s,%s\n", $1, $2, $3, $4, $5 } END { printf "%d records found\n", i }' books.txt
}
read -p "Enter search term: " search
search_book

How to divide a data file's column A by column B using Perl

I was given a text file with a whole bunch of data sorted in columns. Each of the columns are
separated by commas.
How could I divide a column by another column to print an output answer? I am using Perl right now now so it has to be done in Perl. How could I do this?
This is what I have so far:
#!/usr/bin/perl
open (FILE, 'census2008.txt');
while (<FILE>) {
chomp;
($sumlev, $stname,$ctyname,$popestimate2008,$births2008,$deaths2008) = split(",");
}
close (FILE);
exit;
There are several options:
Read the file in line by line, split the columns on ',' and divide the relevant columns (don't forget to handle the divide-by-zero error)
Do the same thing as a one-liner:
$ perl -F/,/ -lane 'print( $F[1] == 0 ? "" : $F[3]/$F[1] )' file.txt
Utilize a ready-to-use CPAN module like Text::CSV
Of course, there are more unorthodox/crazy/unspeakable alternatives à la TMTOWTDI ™, so one could:
Parse out the relevant columns with a regex and divide the matches:
if (/^\d*,(\d+),\d*,(\d+)/) { say $2/$1 if $2 != 0; }
Do it with s///e:
$ perl -ple 's!^\d*,(\d+),\d*,(\d+).*$! $2 == 0 ? "" : $2/$1 !e' file.txt;
Get the shell to do the dirty work via backticks:
sub print_divide { say `cat file.txt | some_command_line_command` }
#!/usr/bin/env perl
# divides column 1 by column 2 of some ','-delimited file,
# read from standard input.
# usage:
# $ cat data.txt | 8458760.pl
while (<STDIN>) {
#values = split(/,/, $_);
print $values[0] / $values[1] . "\n";
}
If you have fixed width columns of data you could use 'unpack' along the lines of:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ($sumlev,$stname,$ctyname,$popest,$births,$deaths)
= unpack("A2xA10xA15xA7xA5xA5");
printf "%-15s %4.2f\n", $ctyname, $births/$deaths;
}
__DATA__
10,Main ,My City , 10000, 200, 150
12,Poplar ,Somewhere , 3000, 90, 100
13,Maple ,Your Place , 9123, 100, 90

Format a file in Unix/Linux ?

I have a file containing country, catalog number, year, description and price
Kenya 563-45 1995 Heron Plover Thrush Gonolek Apalis $6.60
Surinam 632-96 1982 Butterfliers $7.50
Seychelles 831-34 2002 WWF Frogs set of 4 $1.40
Togo 1722-25 2010 Cheetah, Zebra, Antelope $5.70
File isn't delimited by a "tab" or ":" anything. There is only spaces between them. can you please tell me how can I format this file(using awk ?) and how can I find the total price from this.
With command line perl:
$ cat /your/file | perl -e '$sum=0; for(<STDIN>) { $sum += $1 if(/\$([\d\.]+)/); }; print "$sum\n"'
21.2
and awk (assumes you have dollars at the end of each line):
$ cat /your/file | awk '{s+=substr($NF,2)} END{ print s}'
21.2
Also, in response to the comment. If you want to reformat on the command line:
$ cat /your/file | perl -e 'for(<STDIN>){#a=split /\s+/; $p=pop #a; \
$line=join "|", ($a[0],$a[1],$a[2], (join" ",#a[3..$#a]) ,$p); print "$line\n"}'
Kenya|563-45|1995|Heron Plover Thrush Gonolek Apalis|$6.60
Surinam|632-96|1982|Butterfliers|$7.50
Seychelles|831-34|2002|WWF Frogs set of 4|$1.40
Togo|1722-25|2010|Cheetah, Zebra, Antelope|$5.70
If you want to do this properly, I'd do it not on the cmd line, but write a proper program to parse it.
I thought first 3 and last column is fixed meaning but middle columns are not fixed. So middle columns are kept at last with space between and fixed columns are seperated by tab so that you can start to edit it with some spreadsheet program:
awk '{ printf("%s\t%s\t%s\t%s\t", $1, $2, $3, $NF);
for(i=4; i<NF; i++){ printf("%s ", $i); }
printf("\n")
}' < yourlist.txt
For conformity, a regexp-fu solution:
$ perl -lne '/^ (.+?) \s+ (\d+-\d+) \s+ (\d{4}) \s+ (.+?) \s+ ( \$ ( \d+ (?:\.\d+)? ) ) \s* $/x and $t+=$6, print join "•",$1,$2,$3,$4,$5 }{ print $t' input_file
Kenya•563-45•1995•Heron Plover Thrush Gonolek Apalis•$6.60
Surinam•632-96•1982•Butterfliers•$7.50
Seychelles•831-34•2002•WWF Frogs set of 4•$1.40
Togo•1722-25•2010•Cheetah, Zebra, Antelope•$5.70
21.2
Expanding upon udslk's answer, awk is certainly your friend here:
#!/usr/bin/env awk -f
BEGIN {
print "country, \"catalog number\", year, description, \"price ($)\""
}
{
description = $4
for (f = 5; f < NF; ++f) {
description = description " " $f
}
price = substr($NF, 2)
total += price
printf "\"%s\", \"%s\", \"%s\", \"%s\", %0.2f\n", $1, $2, $3, description, price
}
END {
printf "Total, , , , %0.2f\n", total
}
This spits out a CSV file with headers, which you can import into your favourite spreadsheet. It also adds the total. Switch commas with tabs according to taste.

Using awk to print all columns from the nth to the last

This line worked until I had whitespace in the second field.
svn status | grep '\!' | gawk '{print $2;}' > removedProjs
is there a way to have awk print everything in $2 or greater? ($3, $4.. until we don't have anymore columns?)
I suppose I should add that I'm doing this in a Windows environment with Cygwin.
Print all columns:
awk '{print $0}' somefile
Print all but the first column:
awk '{$1=""; print $0}' somefile
Print all but the first two columns:
awk '{$1=$2=""; print $0}' somefile
There's a duplicate question with a simpler answer using cut:
svn status | grep '\!' | cut -d\ -f2-
-d specifies the delimeter (space), -f specifies the list of columns (all starting with the 2nd)
You could use a for-loop to loop through printing fields $2 through $NF (built-in variable that represents the number of fields on the line).
Edit:
Since "print" appends a newline, you'll want to buffer the results:
awk '{out = ""; for (i = 2; i <= NF; i++) {out = out " " $i}; print out}'
Alternatively, use printf:
awk '{for (i = 2; i <= NF; i++) {printf "%s ", $i}; printf "\n"}'
awk '{out=$2; for(i=3;i<=NF;i++){out=out" "$i}; print out}'
My answer is based on the one of VeeArr, but I noticed it started with a white space before it would print the second column (and the rest). As I only have 1 reputation point, I can't comment on it, so here it goes as a new answer:
start with "out" as the second column and then add all the other columns (if they exist). This goes well as long as there is a second column.
Most solutions with awk leave an space. The options here avoid that problem.
Option 1
A simple cut solution (works only with single delimiters):
command | cut -d' ' -f3-
Option 2
Forcing an awk re-calc sometimes remove the added leading space (OFS) left by removing the first fields (works with some versions of awk):
command | awk '{ $1=$2="";$0=$0;} NF=NF'
Option 3
Printing each field formatted with printf will give more control:
$ in=' 1 2 3 4 5 6 7 8 '
$ echo "$in"|awk -v n=2 '{ for(i=n+1;i<=NF;i++) printf("%s%s",$i,i==NF?RS:OFS);}'
3 4 5 6 7 8
However, all previous answers change all repeated FS between fields to OFS. Let's build a couple of option that do not do that.
Option 4 (recommended)
A loop with sub to remove fields and delimiters at the front.
And using the value of FS instead of space (which could be changed).
Is more portable, and doesn't trigger a change of FS to OFS:
NOTE: The ^[FS]* is to accept an input with leading spaces.
$ in=' 1 2 3 4 5 6 7 8 '
$ echo "$in" | awk '{ n=2; a="^["FS"]*[^"FS"]+["FS"]+";
for(i=1;i<=n;i++) sub( a , "" , $0 ) } 1 '
3 4 5 6 7 8
Option 5
It is quite possible to build a solution that does not add extra (leading or trailing) whitespace, and preserve existing whitespace(s) using the function gensub from GNU awk, as this:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=2 'BEGIN{ a="^["FS"]*"; b="([^"FS"]+["FS"]+)"; c="{"n"}"; }
{ print(gensub(a""b""c,"",1)); }'
3 4 5 6 7 8
It also may be used to swap a group of fields given a count n:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=2 'BEGIN{ a="^["FS"]*"; b="([^"FS"]+["FS"]+)"; c="{"n"}"; }
{
d=gensub(a""b""c,"",1);
e=gensub("^(.*)"d,"\\1",1,$0);
print("|"d"|","!"e"!");
}'
|3 4 5 6 7 8 | ! 1 2 !
Of course, in such case, the OFS is used to separate both parts of the line, and the trailing white space of the fields is still printed.
NOTE: [FS]* is used to allow leading spaces in the input line.
I personally tried all the answers mentioned above, but most of them were a bit complex or just not right. The easiest way to do it from my point of view is:
awk -F" " '{ for (i=4; i<=NF; i++) print $i }'
Where -F" " defines the delimiter for awk to use. In my case is the whitespace, which is also the default delimiter for awk. This means that -F" " can be ignored.
Where NF defines the total number of fields/columns. Therefore the loop will begin from the 4th field up to the last field/column.
Where $N retrieves the value of the Nth field. Therefore print $i will print the current field/column based based on the loop count.
awk '{ for(i=3; i<=NF; ++i) printf $i""FS; print "" }'
lauhub proposed this correct, simple and fast solution here
This was irritating me so much, I sat down and wrote a cut-like field specification parser, tested with GNU Awk 3.1.7.
First, create a new Awk library script called pfcut, with e.g.
sudo nano /usr/share/awk/pfcut
Then, paste in the script below, and save. After that, this is how the usage looks like:
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("-4"); }'
t1 t2 t3 t4
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("2-"); }'
t2 t3 t4 t5 t6 t7
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("-2,4,6-"); }'
t1 t2 t4 t6 t7
To avoid typing all that, I guess the best one can do (see otherwise Automatically load a user function at startup with awk? - Unix & Linux Stack Exchange) is add an alias to ~/.bashrc; e.g. with:
$ echo "alias awk-pfcut='awk -f pfcut --source'" >> ~/.bashrc
$ source ~/.bashrc # refresh bash aliases
... then you can just call:
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk-pfcut '/^/ { pfcut("-2,4,6-"); }'
t1 t2 t4 t6 t7
Here is the source of the pfcut script:
# pfcut - print fields like cut
#
# sdaau, GNU GPL
# Nov, 2013
function spfcut(formatstring)
{
# parse format string
numsplitscomma = split(formatstring, fsa, ",");
numspecparts = 0;
split("", parts); # clear/initialize array (for e.g. `tail` piping into `awk`)
for(i=1;i<=numsplitscomma;i++) {
commapart=fsa[i];
numsplitsminus = split(fsa[i], cpa, "-");
# assume here a range is always just two parts: "a-b"
# also assume user has already sorted the ranges
#print numsplitsminus, cpa[1], cpa[2]; # debug
if(numsplitsminus==2) {
if ((cpa[1]) == "") cpa[1] = 1;
if ((cpa[2]) == "") cpa[2] = NF;
for(j=cpa[1];j<=cpa[2];j++) {
parts[numspecparts++] = j;
}
} else parts[numspecparts++] = commapart;
}
n=asort(parts); outs="";
for(i=1;i<=n;i++) {
outs = outs sprintf("%s%s", $parts[i], (i==n)?"":OFS);
#print(i, parts[i]); # debug
}
return outs;
}
function pfcut(formatstring) {
print spfcut(formatstring);
}
Would this work?
awk '{print substr($0,length($1)+1);}' < file
It leaves some whitespace in front though.
Printing out columns starting from #2 (the output will have no trailing space in the beginning):
ls -l | awk '{sub(/[^ ]+ /, ""); print $0}'
echo "1 2 3 4 5 6" | awk '{ $NF = ""; print $0}'
this one uses awk to print all except the last field
This is what I preferred from all the recommendations:
Printing from the 6th to last column.
ls -lthr | awk '{out=$6; for(i=7;i<=NF;i++){out=out" "$i}; print out}'
or
ls -lthr | awk '{ORS=" "; for(i=6;i<=NF;i++) print $i;print "\n"}'
If you need specific columns printed with arbitrary delimeter:
awk '{print $3 " " $4}'
col#3 col#4
awk '{print $3 "anything" $4}'
col#3anythingcol#4
So if you have whitespace in a column it will be two columns, but you can connect it with any delimiter or without it.
Perl solution:
perl -lane 'splice #F,0,1; print join " ",#F' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the perl code
splice #F,0,1 cleanly removes column 0 from the #F array
join " ",#F joins the elements of the #F array, using a space in-between each element
Python solution:
python -c "import sys;[sys.stdout.write(' '.join(line.split()[1:]) + '\n') for line in sys.stdin]" < file
I want to extend the proposed answers to the situation where fields are delimited by possibly several whitespaces –the reason why the OP is not using cut I suppose.
I know the OP asked about awk, but a sed approach would work here (example with printing columns from the 5th to the last):
pure sed approach
sed -r 's/^\s*(\S+\s+){4}//' somefile
Explanation:
s/// is the standard command to perform substitution
^\s* matches any consecutive whitespace at the beginning of the line
\S+\s+ means a column of data (non-whitespace chars followed by whitespace chars)
(){4} means the pattern is repeated 4 times.
sed and cut
sed -r 's/^\s+//; s/\s+/\t/g' somefile | cut -f5-
by just replacing consecutive whitespaces by a single tab;
tr and cut:
tr can also be used to squeeze consecutive characters with the -s option.
tr -s [:blank:] <somefile | cut -d' ' -f5-
If you don't want to reformat the part of the line that you don't chop off, the best solution I can think of is written in my answer in:
How to print all the columns after a particular number using awk?
It chops what is before the given field number N, and prints all the rest of the line, including field number N and maintaining the original spacing (it does not reformat). It doesn't mater if the string of the field appears also somewhere else in the line.
Define a function:
fromField () {
awk -v m="\x01" -v N="$1" '{$N=m$N; print substr($0,index($0,m)+1)}'
}
And use it like this:
$ echo " bat bi iru lau bost " | fromField 3
iru lau bost
$ echo " bat bi iru lau bost " | fromField 2
bi iru lau bost
Output maintains everything, including trailing spaces
In you particular case:
svn status | grep '\!' | fromField 2 > removedProjs
If your file/stream does not contain new-line characters in the middle of the lines (you could be using a different Record Separator), you can use:
awk -v m="\x0a" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
The first case will fail only in files/streams that contain the rare hexadecimal char number 1
This awk function returns substring of $0 that includes fields from begin to end:
function fields(begin, end, b, e, p, i) {
b = 0; e = 0; p = 0;
for (i = 1; i <= NF; ++i) {
if (begin == i) { b = p; }
p += length($i);
e = p;
if (end == i) { break; }
p += length(FS);
}
return substr($0, b + 1, e - b);
}
To get everything starting from field 3:
tail = fields(3);
To get section of $0 that covers fields 3 to 5:
middle = fields(3, 5);
b, e, p, i nonsense in function parameter list is just an awk way of declaring local variables.
All of the other answers given here and in linked questions fail in various ways given various possible FS values. Some leave leading and/or trailing white space, some convert every FS to the OFS, some rely on semantics that only apply when FS is the default value, some rely on negating FS in a bracket expression which will fail given a multi-char FS, etc.
To do this robustly for any FS, use GNU awk for the 4th arg to split():
$ cat tst.awk
{
split($0,flds,FS,seps)
for ( i=n; i<=NF; i++ ) {
printf "%s%s", flds[i], seps[i]
}
print ""
}
$ printf 'a b c d\n' | awk -v n=3 -f tst.awk
c d
$ printf ' a b c d\n' | awk -v n=3 -f tst.awk
c d
$ printf ' a b c d\n' | awk -v n=3 -F'[ ]' -f tst.awk
b c d
$ printf ' a b c d\n' | awk -v n=3 -F'[ ]+' -f tst.awk
b c d
$ printf 'a###b###c###d\n' | awk -v n=3 -F'###' -f tst.awk
c###d
$ printf '###a###b###c###d\n' | awk -v n=3 -F'###' -f tst.awk
b###c###d
Note that I'm using split() above because it's 3rg arg is a field separator, not just a regexp like the 2nd arg to match(). The difference is that field separators have additional semantics to regexps such as skipping leading and/or trailing blanks when the separator is a single blank char - if you wanted to use a while(match()) loop or any form of *sub() to emulate the above then you'd need to write code to implement those semantics whereas split() already implements them for you.
Awk examples looks complex here, here is simple Bash shell syntax:
command | while read -a cols; do echo ${cols[#]:1}; done
Where 1 is your nth column counting from 0.
Example
Given this content of file (in.txt):
c1
c1 c2
c1 c2 c3
c1 c2 c3 c4
c1 c2 c3 c4 c5
here is the output:
$ while read -a cols; do echo ${cols[#]:1}; done < in.txt
c2
c2 c3
c2 c3 c4
c2 c3 c4 c5
This would work if you are using Bash and you could use as many 'x ' as elements you wish to discard and it ignores multiple spaces if they are not escaped.
while read x b; do echo "$b"; done < filename
Perl:
#m=`ls -ltr dir | grep ^d | awk '{print \$6,\$7,\$8,\$9}'`;
foreach $i (#m)
{
print "$i\n";
}
UPDATE :
if you wanna use no function calls at all while preserving the spaces and tabs in between the remaining fields, then do :
echo " 1 2 33 4444 555555 \t6666666 " |
{m,g}awk ++NF FS='^[ \t]*[^ \t]*[ \t]+|[ \t]+$' OFS=
=
2 33 4444 555555 6666666
===================
You can make it a lot more straight forward :
svn status | [m/g]awk '/!/*sub("^[^ \t]*[ \t]+",_)'
svn status | [n]awk '(/!/)*sub("^[^ \t]*[ \t]+",_)'
Automatically takes care of the grep earlier in the pipe, as well as trimming out extra FS after blanking out $1, with the added bonus of leaving rest of the original input untouched instead of having tabs overwritten with spaces (unless that's the desired effect)
If you're very certain $1 does not contain special characters that need regex escaping, then it's even easier :
mawk '/!/*sub($!_"[ \t]+",_)'
gawk -c/P/e '/!/*sub($!_"""[ \t]+",_)'
Or if you prefer customizing FS+OFS to handle it all :
mawk 'NF*=/!/' FS='^[^ \t]*[ \t]+' OFS='' # this version uses OFS
This should be a reasonably comprehensive awk-field-sub-string-extraction function that
returns substring of $0 based on input ranges, inclusive
clamp in out of range values,
handle variable length field SEPs
has speedup treatments for ::
completely no inputs, returning $0 directly
input values resulting in guaranteed empty string ("")
FROM-field == 1
FS = "" that has split $0 out by individual chars
(so the FROM <(_)> and TO <(__)> fields behave like cut -c rather than cut -f)
original $0 restored, w/o overwriting FS seps with OFS
|
{m,g}awk '{
2 print "\n|---BEFORE-------------------------\n"
3 ($0) "\n|----------------------------\n\n ["
4 fld2(2, 5) "]\n [" fld2(3) "]\n [" fld2(4, 2)
5 "]<----------------------------------------------should be
6 empty\n [" fld2(3, 11) "]<------------------------should be
7 capped by NF\n [" fld2() "]\n [" fld2((OFS=FS="")*($0=$0)+11,
8 23) "]<-------------------FS=\"\", split by chars
9 \n\n|---AFTER-------------------------\n" ($0)
10 "\n|----------------------------"
11 }
12 function fld2(_,__,___,____,_____)
13 {
if (+__==(_=-_<+_ ?+_:_<_) || (___=____="")==__ || !NF) {
return $_
16 } else if (NF<_ || (__=NF<+__?NF:+__)<(_=+_?_:!_)) {
return ___
18 } else if (___==FS || _==!___) {
19 return ___<FS \
? substr("",$!_=$!_ substr("",__=$!(NF=__)))__
20 : substr($(_<_),_,__)
21 }
22 _____=$+(____=___="\37\36\35\32\31\30\27\26\25"\
"\24\23\21\20\17\16\6\5\4\3\2\1")
23 NF=__
24 if ($(!_)~("["(___)"]")) {
25 gsub("..","\\&&",___) + gsub(".",___,____)
27 ___=____
28 }
29 __=(_) substr("",_+=_^=_<_)
30 while(___!="") {
31 if ($(!_)!~(____=substr(___,--_,++_))) {
32 ___=____
33 break }
35 ___=substr(___,_+_^(!_))
36 }
37 return \
substr("",($__=___ $__)==(__=substr($!_,
_+index($!_,___))),_*($!_=_____))(__)
}'
those <TAB> are actual \t \011 but relabeled for display clarity
|---BEFORE-------------------------
1 2 33 4444 555555 <TAB>6666666
|----------------------------
[2 33 4444 555555]
[33]
[]<---------------------------------------------- should be empty
[33 4444 555555 6666666]<------------------------ should be capped by NF
[ 1 2 33 4444 555555 <TAB>6666666 ]
[ 2 33 4444 555555 <TAB>66]<------------------- FS="", split by chars
|---AFTER-------------------------
1 2 33 4444 555555 <TAB>6666666
|----------------------------
I wasn't happy with any of the awk solutions presented here because I wanted to extract the first few columns and then print the rest, so I turned to perl instead. The following code extracts the first two columns, and displays the rest as is:
echo -e "a b c d\te\t\tf g" | \
perl -ne 'my #f = split /\s+/, $_, 3; printf "first: %s second: %s rest: %s", #f;'
The advantage compared to the perl solution from Chris Koknat is that really only the first n elements are split off from the input string; the rest of the string isn't split at all and therefor stays completely intact. My example demonstrates this with a mix of spaces and tabs.
To change the amount of columns that should be extracted, replace the 3 in the example with n+1.
ls -la | awk '{o=$1" "$3; for (i=5; i<=NF; i++) o=o" "$i; print o }'
from this answer is not bad but the natural spacing is gone.
Please then compare it to this one:
ls -la | cut -d\ -f4-
Then you'd see the difference.
Even ls -la | awk '{$1=$2=""; print}' which is based on the answer voted best thus far is not preserve the formatting.
Thus I would use the following, and it also allows explicit selective columns in the beginning:
ls -la | cut -d\ -f1,4-
Note that every space counts for columns too, so for instance in the below, columns 1 and 3 are empty, 2 is INFO and 4 is:
$ echo " INFO 2014-10-11 10:16:19 main " | cut -d\ -f1,3
$ echo " INFO 2014-10-11 10:16:19 main " | cut -d\ -f2,4
INFO 2014-10-11
$
If you want formatted text, chain your commands with echo and use $0 to print the last field.
Example:
for i in {8..11}; do
s1="$i"
s2="str$i"
s3="str with spaces $i"
echo -n "$s1 $s2" | awk '{printf "|%3d|%6s",$1,$2}'
echo -en "$s3" | awk '{printf "|%-19s|\n", $0}'
done
Prints:
| 8| str8|str with spaces 8 |
| 9| str9|str with spaces 9 |
| 10| str10|str with spaces 10 |
| 11| str11|str with spaces 11 |
The top-voted answer by zed_0xff did not work for me.
I have a log where after $5 with an IP address can be more text or no text. I need everything from the IP address to the end of the line should there be anything after $5. In my case, this is actually within an awk program, not an awk one-liner so awk must solve the problem. When I try to remove the first 4 fields using the solution proposed by zed_0xff:
echo " 7 27.10.16. Thu 11:57:18 37.244.182.218" | awk '{$1=$2=$3=$4=""; printf "[%s]\n", $0}'
it spits out wrong and useless response (I added [..] to demonstrate):
[ 37.244.182.218 one two three]
There are even some suggestions to combine substr with this wrong answer, but that only complicates things. It offers no improvement.
Instead, if columns are fixed width until the cut point and awk is needed, the correct answer is:
echo " 7 27.10.16. Thu 11:57:18 37.244.182.218" | awk '{printf "[%s]\n", substr($0,28)}'
which produces the desired output:
[37.244.182.218 one two three]

Resources