Convert number to gigabytes (sed, awk, sh, bash) - linux

I have the following multi-line output:
Volume Name: GATE02-SYSTEM
Size: 151934468096
Volume Name: GATE02-S
Size: 2000112582656
Is it possible to convert strings like:
Volume Name: GATE02-SYSTEM
Size: 141.5 Gb
Volume Name: GATE02-S
Size: 1862.75 Gb
with sed/awk? I looked for answers for the only number output like:
echo "151934468096" | awk '{ byte =$1 /1024/1024**2 ; print byte " GB" }'
but can't figure out how to apply it to my case.
EDIT: I'd like to clarify my case. I have the following one-liner:
esxcfg-info | grep -A 17 "\\==+Vm FileSystem" | grep -B 15 "[I]s Accessible.*true" | grep -B 4 -A 11 "Head Extent.*ATA.*$" | sed -r 's/[.]{2,}/: /g;s/^ {12}//g;s/\|----//g' | egrep -i "([V]olume Name|^[S]ize:)"
which generates output above. I want to know what to add more after the last command to get the desired output.

Using (GNU) sed and bc:
... | sed 's#Size: \([0-9]*\)#printf "Size: %s" $(echo "scale = 2; \1 / 1024 ^ 3" | bc)GiB#e'
^^^^ ^^^^^^^^^^ ^^^^^^ ^^^^^^^^^ ^^ ^^^^^^^^^^ ^^ ^
1 2 3 4 5 6 7 8
Matching Size: lines.
Capturing the number of bytes (potentially vulnerable if empty or zero, strengthen for production use)
Replacing with printf output
scale = sets the number of decimals for bc
Using the sed capture group (number of bytes)
The mathematical operation on the byte number
Piping all that to bc for evaluation
Telling sed to execute the "replace" part of the s statement in a subshell.
Another option is numfmt (where available, since GNU coreutils 8.21):
... | sed 's#Size: \([0-9]*\)#printf "%s" $(echo \1 | numfmt --to=iec-i --suffix=B)#e'
This doesn't give you control over the decimals, but "works" for all sizes (i.e. gives TiB where the number is big enough, and does not truncate to "0.00 GiB" for too-small numbers).

printf "Volume Name: GATE02-SYSTEM\nSize: 151934468096\nVolume Name: GATE02-S\nSize: 2000112582656" |
awk '/^Size: / { $2 = $2/1024/1024**2 " Gb" }1'

Related

How can I fix my bash script to find a random word from a dictionary?

I'm studying bash scripting and I'm stuck fixing an exercise of this site: https://ryanstutorials.net/bash-scripting-tutorial/bash-variables.php#activities
The task is to write a bash script to output a random word from a dictionary whose length is equal to the number supplied as the first command line argument.
My idea was to create a sub-dictionary, assign each word a number line, select a random number from those lines and filter the output, which worked for a similar simpler script, but not for this.
This is the code I used:
6 DIC='/usr/share/dict/words'
7 SUBDIC=$( egrep '^.{'$1'}$' $DIC )
8
9 MAX=$( $SUBDIC | wc -l )
10 RANDRANGE=$((1 + RANDOM % $MAX))
11
12 RWORD=$(nl "$SUBDIC" | grep "\b$RANDRANGE\b" | awk '{print $2}')
13
14 echo "Random generated word from $DIC which is $1 characters long:"
15 echo $RWORD
and this is the error I get using as input "21":
bash script.sh 21
script.sh: line 9: counterintelligence's: command not found
script.sh: line 10: 1 + RANDOM % 0: division by 0 (error token is "0")
nl: 'counterintelligence'\''s'$'\n''electroencephalograms'$'\n''electroencephalograph': No such file or directory
Random generated word from /usr/share/dict/words which is 21 characters long:
I tried in bash to split the code in smaller pieces obtaining no error (input=21):
egrep '^.{'21'}$' /usr/share/dict/words | wc -l
3
but once in the script line 9 and 10 give error.
Where do you think is the error?
problems
SUBDIC=$( egrep '^.{'$1'}$' $DIC ) will store all words of the given length in the SUBDIC variable, so it's content is now something like foo bar baz.
MAX=$( $SUBDIC | ... ) will try to run the command foo bar baz which is obviously bogus; it should be more like MAX=$(echo $SUBDIC | ... )
MAX=$( ... | wc -l ) will count the lines; when using the above mentioned echo $SUBDIC you will have multiple words, but all in one line...
RWORD=$(nl "$SUBDIC" | ...) same problem as above: there's only one line (also note #armali's answer that nl requires a file or stdin)
RWORD=$(... | grep "\b$RANDRANGE\b" | ...) might match the dictionary entry catch 22
likely RWORD=$(... | awk '{print $2}') won't handle lines containing spaces
a simple solution
doing a "random sort" over the all the possible words and taking the first line, should be sufficient:
egrep "^.{$1}$" "${DIC}" | sort -R | head -1
MAX=$( $SUBDIC | wc -l ) - A pipe is used for connecting a command's output, while $SUBDIC isn't a command; an appropriate syntax is MAX=$( <<<$SUBDIC wc -l ).
nl "$SUBDIC" - The argument to nl has to be a filename, which "$SUBDIC" isn't; an appropriate syntax is nl <<<"$SUBDIC".
This code will do it. My test dictionary of words is in file file. It's a good idea to get all words of a given length first but put them in an array not in var. And then get a random index and echo it.
dic=( $(sed -n "/^.\{$1\}$/p" file) )
ind=$((0 + RANDOM % ${#dic[#]}))
echo ${dic[$ind]}
I am also doing this activity and I create one simple solution.
I create the script.
#!/bin/bash
awk "NR==$1 {print}" /usr/share/dict/words
Here if you want a random string then you have to run the script as per the below command from the terminal.
./script.sh $RANDOM
If you want the print any specific number string then you can run as per the below command from the terminal.
./script.sh 465
cat /usr/share/dict/american-english | head -n $RANDOM | tail -n 1
$RANDOM - Returns a different random number each time is it referred to.
this simple line outputs random word from the mentioned dictionary.
Otherwise as umläute mentined you can do:
cat /usr/share/dict/american-english | sort -R | head -1

Print a row of 16 lines evenly side by side (column)

I have a file with unknown number of lines(but even number of lines). I want to print them side by side based on total number of lines in that file. For example, I have a file with 16 lines like below:
asdljsdbfajhsdbflakjsdff235
asjhbasdjbfajskdfasdbajsdx3
asjhbasdjbfajs23kdfb235ajds
asjhbasdjbfajskdfbaj456fd3v
asjhbasdjb6589fajskdfbaj235
asjhbasdjbfajs54kdfbaj2f879
asjhbasdjbfajskdfbajxdfgsdh
asjhbasdf3709ddjbfajskdfbaj
100
100
150
125
trh77rnv9vnd9dfnmdcnksosdmn
220
225
sdkjNSDfasd89asdg12asdf6asdf
So now i want to print them side by side. as they have 16 lines in total, I am trying to get the results 8:8 like below
asdljsdbfajhsdbflakjsdff235 100
asjhbasdjbfajskdfasdbajsdx3 100
asjhbasdjbfajs23kdfb235ajds 150
asjhbasdjbfajskdfbaj456fd3v 125
asjhbasdjb6589fajskdfbaj235 trh77rnv9vnd9dfnmdcnksosdmn
asjhbasdjbfajs54kdfbaj2f879 220
asjhbasdjbfajskdfbajxdfgsdh 225
asjhbasdf3709ddjbfajskdfbaj sdkjNSDfasd89asdg12asdf6asdf
paste command did not work for me exactly, (paste - - - - - - - -< file1) nor the awk command that I used awk '{printf "%s" (NR%2==0?RS:FS),$1}'
Note: The number of lines in a file dynamic. The only known thing in my scenario is, they are even number all the time.
If you have the memory to hash the whole file ("max" below):
$ awk '{
a[NR]=$0 # hash all the records
}
END { # after hashing
mid=int(NR/2) # compute the midpoint, int in case NR is uneven
for(i=1;i<=mid;i++) # iterate from start to midpoint
print a[i],a[mid+i] # output
}' file
If you have the memory to hash half of the file ("mid"):
$ awk '
NR==FNR { # on 1st pass hash second half of records
if(FNR>1) { # we dont need the 1st record ever
a[FNR]=$0 # hash record
if(FNR%2) # if odd record
delete a[int(FNR/2)+1] # remove one from the past
}
next
}
FNR==1 { # on the start of 2nd pass
if(NR%2==0) # if record count is uneven
exit # exit as there is always even count of them
offset=int((NR-1)/2) # compute offset to the beginning of hash
}
FNR<=offset { # only process the 1st half of records
print $0,a[offset+FNR] # output one from file, one from hash
next
}
{ # once 1st half of 2nd pass is finished
exit # just exit
}' file file # notice filename twice
And finally if you have awk compiled into a worms brain (ie. not so much memory, "min"):
$ awk '
NR==FNR { # just get the NR of 1st pass
next
}
FNR==1 {
mid=(NR-1)/2 # get the midpoint
file=FILENAME # filename for getline
while(++i<=mid && (getline line < file)>0); # jump getline to mid
}
{
if((getline line < file)>0) # getline read from mid+FNR
print $0,line # output
}' file file # notice filename twice
Standard disclaimer on getline and no real error control implemented.
Performance:
I seq 1 100000000 > file and tested how the above solutions performed. Output was > /dev/null but writing it to a file lasted around 2 s longer. max performance is so-so as the mem print was 88 % of my 16 GB so it might have swapped. Well, I killed all the browsers and shaved off 7 seconds for the real time of max.
+------------------+-----------+-----------+
| which | | |
| min | mid | max |
+------------------+-----------+-----------+
| time | | |
| real 1m7.027s | 1m30.146s | 0m48.405s |
| user 1m6.387s | 1m27.314 | 0m43.801s |
| sys 0m0.641s | 0m2.820s | 0m4.505s |
+------------------+-----------+-----------+
| mem | | |
| 3 MB | 6.8 GB | 13.5 GB |
+------------------+-----------+-----------+
Update:
I tested #DavidC.Rankin's and #EdMorton's solutions and they ran, respectively:
real 0m41.455s
user 0m39.086s
sys 0m2.369s
and
real 0m39.577s
user 0m37.037s
sys 0m2.541s
Mem print was about the same as my mid had. It pays to use the wc, it seems.
$ pr -2t file
asdljsdbfajhsdbflakjsdff235 100
asjhbasdjbfajskdfasdbajsdx3 100
asjhbasdjbfajs23kdfb235ajds 150
asjhbasdjbfajskdfbaj456fd3v 125
asjhbasdjb6589fajskdfbaj235 trh77rnv9vnd9dfnmdcnksosdmn
asjhbasdjbfajs54kdfbaj2f879 220
asjhbasdjbfajskdfbajxdfgsdh 225
asjhbasdf3709ddjbfajskdfbaj sdkjNSDfasd89asdg12asdf6asdf
if you want just one space between columns, change to
$ pr -2ts' ' file
You can also do it with awk simply by storing the first-half of the lines in an array and then concatenating the second half to the end, e.g.
awk -v nlines=$(wc -l < file) -v j=0 'FNR<=nlines/2{a[++i]=$0; next} j<i{print a[++j],$1}' file
Example Use/Output
With your data in file, then
$ awk -v nlines=$(wc -l < file) -v j=0 'FNR<=nlines/2{a[++i]=$0; next} j<i{print a[++j],$1}' file
asdljsdbfajhsdbflakjsdff235 100
asjhbasdjbfajskdfasdbajsdx3 100
asjhbasdjbfajs23kdfb235ajds 150
asjhbasdjbfajskdfbaj456fd3v 125
asjhbasdjb6589fajskdfbaj235 trh77rnv9vnd9dfnmdcnksosdmn
asjhbasdjbfajs54kdfbaj2f879 220
asjhbasdjbfajskdfbajxdfgsdh 225
asjhbasdf3709ddjbfajskdfbaj sdkjNSDfasd89asdg12asdf6asdf
Extract the first half of the file and the last half of the file and merge the lines:
paste <(head -n $(($(wc -l <file.txt)/2)) file.txt) <(tail -n $(($(wc -l <file.txt)/2)) file.txt)
You can use columns utility from autogen:
columns -c2 --by-columns file.txt
You can use column, but the count of columns is calculated in a strange way from the count of columns of your terminal. So assuming your lines have 28 characters, you also can:
column -c $((28*2+8)) file.txt
I do not want to solve this, but if I were you:
wc -l file.txt
gives number of lines
echo $(($(wc -l < file.txt)/2))
gives a half
head -n $(($(wc -l < file.txt)/2)) file.txt > first.txt
tail -n $(($(wc -l < file.txt)/2)) file.txt > last.txt
create file with first half and last half of the original file. Now you can merge those files together side by side as it was described here .
Here is my take on it using the bash shell wc(1) and ed(1)
#!/usr/bin/env bash
array=()
file=$1
total=$(wc -l < "$file")
half=$(( total / 2 ))
plus1=$(( half + 1 ))
for ((m=1;m<=half;m++)); do
array+=("${plus1}m$m" "${m}"'s/$/ /' "${m}"',+1j')
done
After all of that if just want to print the output to stdout. Add the line below to the script.
printf '%s\n' "${array[#]}" ,p Q | ed -s "$file"
If you want to write the changes directly to the file itself, Use this code instead below the script.
printf '%s\n' "${array[#]}" w | ed -s "$file"
Here is an example.
printf '%s\n' {1..10} > file.txt
Now running the script against that file.
./myscript file.txt
Output
1 6
2 7
3 8
4 9
5 10
Or using bash4+ feature mapfile aka readarray
Save the file in an array named array.
mapfile -t array < file.txt
Separate the files.
left=("${array[#]::((${#array[#]} / 2))}") right=("${array[#]:((${#array[#]} / 2 ))}")
loop and print side-by-side
for i in "${!left[#]}"; do
printf '%s %s\n' "${left[i]}" "${right[i]}"
done
What you said The only known thing in my scenario is, they are even number all the time. That solution should work.

Counting total occurrences of each 'version' across multiple files

I have a number of files in a directory on Linux, each of which contains a version line in the format: #version x (where x is the version number).
I'm trying to find a way to count the number of times each different version appears across all the files, and output something like:
#version 1: 12
#version 2: 36
#version 3: 2
I don't know all the potential versions that might exist, so I'm really trying to match lines that contain #version.
I've tried using things like grep -c - however that only gives the total of all lines containing #version - I can't find a nice way to split on the different version numbers.
A possibility piping multiple commands:
strings * | grep '#version \w' | sort | uniq --count | awk '{printf("%s: %s\n", substr($0, index($0, $2)), $1)}''
Operations breakdown:
strings *: Extract text strings from * all files in current directory.
| grep '#version \w': Pipe the strings into the grep command, to find all occurrences of #version word.
sort: Pipe the version strings to the sort command.
| uniq --count: Pipe the occurrences of #version lines into the uniq command, to output count of each #version... string.
awk '{printf("%s: %s\n", substr($0, index($0, $2)), $1)}': Pipe the unique counts into the awk command, to re-format the output as: #version ...: count.
Testing the process:
cd /tmp
mkdir testing 2>/dev/null || true
cd testing
# Create 10 testfile#.txt with random #version 1 to 4
for i in {1..10}; do
echo "#version $(($RANDOM%4+1))" >"testfile${i}.txt"
done
# Now get the counts per version
strings * \
| grep '#version \w' \
| sort \
| uniq --count \
| awk '{printf("%s: %s\n", substr($0, index($0, $2)), $1)}'
Example of test output:
#version 1: 4
#version 2: 2
#version 3: 1
#version 4: 3
Something like this may do the trick:
grep -h '#version' * | sort | uniq -c | awk '{print $2,$3": found "$1}'
example files:
filename:filecontent
file1:#version 1
file1.1:#version 1
file111:#version 1
file2:#version 2
file3:#version 3
file4:#version 4
file44:#version 4
Output:
#version 1: found 3
#version 2: found 1
#version 3: found 1
#version 4: found 2
grep version * gets all files with version.sort sorts the results for uniq -c which counts the number of duplicates then awk rearranges the output for desired formatting.
Note: grep might have a slightly different separator than : on your OS.

Search for string in a file from a certain line no

How to search for a string from/after certain line in text file using bash script?
E.g. I want to search for first occurrence of "version:" string but not at start of file but at line no. say 35 which contains text *-disk:0 so that I would get product name of disk-0 only and nothing else.
My current approach is as follows where line_no was line no. of the line where disk:0 is present. But sometimes there is vendor name also present in-between the disk:0 and version. At that time, this logic fails.
ver_line_no=$(echo $(( line_no + 6 )))
ver_line_text=`sed -n ${ver_line_no}p $1`
check_if_present=`echo $fver_line_text | grep "version:"`
Background:
I am trying to parse output of lshw commmand.
*-disk:0
description: ATA Disk
product: SAMSUNG
physical id: 0
bus info: scsi#z:y:z:0
logical name: /dev/sda
version: abc
serial: pqr
size: 2048GiB
capabilities: axy, pqr
configuration: pqr, abc, ghj
*-disk:1
description: ATA Disk
product: TOSHIBA
physical id: 0
bus info: scsi#p:q:z:0
logical name: /dev/sdb
version: nmh
serial: pqasd
size: 2048GiB
capabilities: axy, pqr
configuration: pqr, abc, ghj
This is the sample information.
I want to print information in tabular format using bash script.
You should be able to cut out the block you want with sed, then use grep:
sudo lshw | sed -n '/\*-disk:0/,/\s*\*/p' | grep 'version:'
The sed command does not print any lines (-n), then finds the block between *-disk:0 and the next * and prints only that (p).
sed solution:
If you want to search for first occurrence after given line number (e.g. 10).
l=10
lshw | sed -n "${l},$ {/version/{p;q}}"
If you want to search for first occurrence after given line content (e.g. *-disk:0)
lshw | sed -n '/*-disk:0/,${/version/{p;q}}'
You may use this awk that searches for *-disk:0 in a file to toggle a flag to true:
awk -F: '$1 ~ /\*-disk$/{p = ($2 == 0)} p && /^[[:blank:]]*version:/' file
version: abc
You need to print all lines up the end. The $ represents the end in sed.
sed -p '6,$p'
Will print lines from 6th line to the end. Be aware of quoting, so that $ doesn't get expanded.
You want:
ver_line_no=$(( line_no + 6 ))
ver_line_text=$(sed -n "${ver_line_no}"',$p' "$1")
check_if_present=$(echo "$fver_line_text" | grep "version:")
Notes:
Backticks ` are deprecated. Use $( ...) command substitution instead.
Always quote your variables.
Doing echo $(( .. )) is just repeating yourself. Just $(( ... )).
Sometimes on systems without sed you can use cut -d $'\n' -f6-.
A general solution using awk (assuming that every disk has a version).
File 'find_disk_version.awk'
/disk:/ {
disk_found="true"
disk_name=$1
}
/version:/ {
if (disk_found) {
print disk_name" "$2
disk_found=""
}
}
Test file 'test':
Something_else
version: ver_something_else
serial: blabla
configuration: foo, bar
*-disk:0
description: ATA Disk
version: ver.0
serial: pqr
*-disk:1
version: ver1
serial: pqasd
configuration: pqr, abc, ghj
Something_else_again
version: ver_somethingelse_again
serial: foobar
configuration: bar, foo
*-disk:2
version: ver2
serial: pqasd
configuration: pqr, abc, ghj
Output:
$ cat test | awk -f find_disk_version.awk
*-disk:0 ver.0
*-disk:1 ver1
*-disk:2 ver2
Instead of 'cat test' can be your command 'lshw'
P.S. the script will not work if there is a disk without version.

Using awk to print all columns from the nth to the last

This line worked until I had whitespace in the second field.
svn status | grep '\!' | gawk '{print $2;}' > removedProjs
is there a way to have awk print everything in $2 or greater? ($3, $4.. until we don't have anymore columns?)
I suppose I should add that I'm doing this in a Windows environment with Cygwin.
Print all columns:
awk '{print $0}' somefile
Print all but the first column:
awk '{$1=""; print $0}' somefile
Print all but the first two columns:
awk '{$1=$2=""; print $0}' somefile
There's a duplicate question with a simpler answer using cut:
svn status | grep '\!' | cut -d\ -f2-
-d specifies the delimeter (space), -f specifies the list of columns (all starting with the 2nd)
You could use a for-loop to loop through printing fields $2 through $NF (built-in variable that represents the number of fields on the line).
Edit:
Since "print" appends a newline, you'll want to buffer the results:
awk '{out = ""; for (i = 2; i <= NF; i++) {out = out " " $i}; print out}'
Alternatively, use printf:
awk '{for (i = 2; i <= NF; i++) {printf "%s ", $i}; printf "\n"}'
awk '{out=$2; for(i=3;i<=NF;i++){out=out" "$i}; print out}'
My answer is based on the one of VeeArr, but I noticed it started with a white space before it would print the second column (and the rest). As I only have 1 reputation point, I can't comment on it, so here it goes as a new answer:
start with "out" as the second column and then add all the other columns (if they exist). This goes well as long as there is a second column.
Most solutions with awk leave an space. The options here avoid that problem.
Option 1
A simple cut solution (works only with single delimiters):
command | cut -d' ' -f3-
Option 2
Forcing an awk re-calc sometimes remove the added leading space (OFS) left by removing the first fields (works with some versions of awk):
command | awk '{ $1=$2="";$0=$0;} NF=NF'
Option 3
Printing each field formatted with printf will give more control:
$ in=' 1 2 3 4 5 6 7 8 '
$ echo "$in"|awk -v n=2 '{ for(i=n+1;i<=NF;i++) printf("%s%s",$i,i==NF?RS:OFS);}'
3 4 5 6 7 8
However, all previous answers change all repeated FS between fields to OFS. Let's build a couple of option that do not do that.
Option 4 (recommended)
A loop with sub to remove fields and delimiters at the front.
And using the value of FS instead of space (which could be changed).
Is more portable, and doesn't trigger a change of FS to OFS:
NOTE: The ^[FS]* is to accept an input with leading spaces.
$ in=' 1 2 3 4 5 6 7 8 '
$ echo "$in" | awk '{ n=2; a="^["FS"]*[^"FS"]+["FS"]+";
for(i=1;i<=n;i++) sub( a , "" , $0 ) } 1 '
3 4 5 6 7 8
Option 5
It is quite possible to build a solution that does not add extra (leading or trailing) whitespace, and preserve existing whitespace(s) using the function gensub from GNU awk, as this:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=2 'BEGIN{ a="^["FS"]*"; b="([^"FS"]+["FS"]+)"; c="{"n"}"; }
{ print(gensub(a""b""c,"",1)); }'
3 4 5 6 7 8
It also may be used to swap a group of fields given a count n:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=2 'BEGIN{ a="^["FS"]*"; b="([^"FS"]+["FS"]+)"; c="{"n"}"; }
{
d=gensub(a""b""c,"",1);
e=gensub("^(.*)"d,"\\1",1,$0);
print("|"d"|","!"e"!");
}'
|3 4 5 6 7 8 | ! 1 2 !
Of course, in such case, the OFS is used to separate both parts of the line, and the trailing white space of the fields is still printed.
NOTE: [FS]* is used to allow leading spaces in the input line.
I personally tried all the answers mentioned above, but most of them were a bit complex or just not right. The easiest way to do it from my point of view is:
awk -F" " '{ for (i=4; i<=NF; i++) print $i }'
Where -F" " defines the delimiter for awk to use. In my case is the whitespace, which is also the default delimiter for awk. This means that -F" " can be ignored.
Where NF defines the total number of fields/columns. Therefore the loop will begin from the 4th field up to the last field/column.
Where $N retrieves the value of the Nth field. Therefore print $i will print the current field/column based based on the loop count.
awk '{ for(i=3; i<=NF; ++i) printf $i""FS; print "" }'
lauhub proposed this correct, simple and fast solution here
This was irritating me so much, I sat down and wrote a cut-like field specification parser, tested with GNU Awk 3.1.7.
First, create a new Awk library script called pfcut, with e.g.
sudo nano /usr/share/awk/pfcut
Then, paste in the script below, and save. After that, this is how the usage looks like:
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("-4"); }'
t1 t2 t3 t4
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("2-"); }'
t2 t3 t4 t5 t6 t7
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("-2,4,6-"); }'
t1 t2 t4 t6 t7
To avoid typing all that, I guess the best one can do (see otherwise Automatically load a user function at startup with awk? - Unix & Linux Stack Exchange) is add an alias to ~/.bashrc; e.g. with:
$ echo "alias awk-pfcut='awk -f pfcut --source'" >> ~/.bashrc
$ source ~/.bashrc # refresh bash aliases
... then you can just call:
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk-pfcut '/^/ { pfcut("-2,4,6-"); }'
t1 t2 t4 t6 t7
Here is the source of the pfcut script:
# pfcut - print fields like cut
#
# sdaau, GNU GPL
# Nov, 2013
function spfcut(formatstring)
{
# parse format string
numsplitscomma = split(formatstring, fsa, ",");
numspecparts = 0;
split("", parts); # clear/initialize array (for e.g. `tail` piping into `awk`)
for(i=1;i<=numsplitscomma;i++) {
commapart=fsa[i];
numsplitsminus = split(fsa[i], cpa, "-");
# assume here a range is always just two parts: "a-b"
# also assume user has already sorted the ranges
#print numsplitsminus, cpa[1], cpa[2]; # debug
if(numsplitsminus==2) {
if ((cpa[1]) == "") cpa[1] = 1;
if ((cpa[2]) == "") cpa[2] = NF;
for(j=cpa[1];j<=cpa[2];j++) {
parts[numspecparts++] = j;
}
} else parts[numspecparts++] = commapart;
}
n=asort(parts); outs="";
for(i=1;i<=n;i++) {
outs = outs sprintf("%s%s", $parts[i], (i==n)?"":OFS);
#print(i, parts[i]); # debug
}
return outs;
}
function pfcut(formatstring) {
print spfcut(formatstring);
}
Would this work?
awk '{print substr($0,length($1)+1);}' < file
It leaves some whitespace in front though.
Printing out columns starting from #2 (the output will have no trailing space in the beginning):
ls -l | awk '{sub(/[^ ]+ /, ""); print $0}'
echo "1 2 3 4 5 6" | awk '{ $NF = ""; print $0}'
this one uses awk to print all except the last field
This is what I preferred from all the recommendations:
Printing from the 6th to last column.
ls -lthr | awk '{out=$6; for(i=7;i<=NF;i++){out=out" "$i}; print out}'
or
ls -lthr | awk '{ORS=" "; for(i=6;i<=NF;i++) print $i;print "\n"}'
If you need specific columns printed with arbitrary delimeter:
awk '{print $3 " " $4}'
col#3 col#4
awk '{print $3 "anything" $4}'
col#3anythingcol#4
So if you have whitespace in a column it will be two columns, but you can connect it with any delimiter or without it.
Perl solution:
perl -lane 'splice #F,0,1; print join " ",#F' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the perl code
splice #F,0,1 cleanly removes column 0 from the #F array
join " ",#F joins the elements of the #F array, using a space in-between each element
Python solution:
python -c "import sys;[sys.stdout.write(' '.join(line.split()[1:]) + '\n') for line in sys.stdin]" < file
I want to extend the proposed answers to the situation where fields are delimited by possibly several whitespaces –the reason why the OP is not using cut I suppose.
I know the OP asked about awk, but a sed approach would work here (example with printing columns from the 5th to the last):
pure sed approach
sed -r 's/^\s*(\S+\s+){4}//' somefile
Explanation:
s/// is the standard command to perform substitution
^\s* matches any consecutive whitespace at the beginning of the line
\S+\s+ means a column of data (non-whitespace chars followed by whitespace chars)
(){4} means the pattern is repeated 4 times.
sed and cut
sed -r 's/^\s+//; s/\s+/\t/g' somefile | cut -f5-
by just replacing consecutive whitespaces by a single tab;
tr and cut:
tr can also be used to squeeze consecutive characters with the -s option.
tr -s [:blank:] <somefile | cut -d' ' -f5-
If you don't want to reformat the part of the line that you don't chop off, the best solution I can think of is written in my answer in:
How to print all the columns after a particular number using awk?
It chops what is before the given field number N, and prints all the rest of the line, including field number N and maintaining the original spacing (it does not reformat). It doesn't mater if the string of the field appears also somewhere else in the line.
Define a function:
fromField () {
awk -v m="\x01" -v N="$1" '{$N=m$N; print substr($0,index($0,m)+1)}'
}
And use it like this:
$ echo " bat bi iru lau bost " | fromField 3
iru lau bost
$ echo " bat bi iru lau bost " | fromField 2
bi iru lau bost
Output maintains everything, including trailing spaces
In you particular case:
svn status | grep '\!' | fromField 2 > removedProjs
If your file/stream does not contain new-line characters in the middle of the lines (you could be using a different Record Separator), you can use:
awk -v m="\x0a" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
The first case will fail only in files/streams that contain the rare hexadecimal char number 1
This awk function returns substring of $0 that includes fields from begin to end:
function fields(begin, end, b, e, p, i) {
b = 0; e = 0; p = 0;
for (i = 1; i <= NF; ++i) {
if (begin == i) { b = p; }
p += length($i);
e = p;
if (end == i) { break; }
p += length(FS);
}
return substr($0, b + 1, e - b);
}
To get everything starting from field 3:
tail = fields(3);
To get section of $0 that covers fields 3 to 5:
middle = fields(3, 5);
b, e, p, i nonsense in function parameter list is just an awk way of declaring local variables.
All of the other answers given here and in linked questions fail in various ways given various possible FS values. Some leave leading and/or trailing white space, some convert every FS to the OFS, some rely on semantics that only apply when FS is the default value, some rely on negating FS in a bracket expression which will fail given a multi-char FS, etc.
To do this robustly for any FS, use GNU awk for the 4th arg to split():
$ cat tst.awk
{
split($0,flds,FS,seps)
for ( i=n; i<=NF; i++ ) {
printf "%s%s", flds[i], seps[i]
}
print ""
}
$ printf 'a b c d\n' | awk -v n=3 -f tst.awk
c d
$ printf ' a b c d\n' | awk -v n=3 -f tst.awk
c d
$ printf ' a b c d\n' | awk -v n=3 -F'[ ]' -f tst.awk
b c d
$ printf ' a b c d\n' | awk -v n=3 -F'[ ]+' -f tst.awk
b c d
$ printf 'a###b###c###d\n' | awk -v n=3 -F'###' -f tst.awk
c###d
$ printf '###a###b###c###d\n' | awk -v n=3 -F'###' -f tst.awk
b###c###d
Note that I'm using split() above because it's 3rg arg is a field separator, not just a regexp like the 2nd arg to match(). The difference is that field separators have additional semantics to regexps such as skipping leading and/or trailing blanks when the separator is a single blank char - if you wanted to use a while(match()) loop or any form of *sub() to emulate the above then you'd need to write code to implement those semantics whereas split() already implements them for you.
Awk examples looks complex here, here is simple Bash shell syntax:
command | while read -a cols; do echo ${cols[#]:1}; done
Where 1 is your nth column counting from 0.
Example
Given this content of file (in.txt):
c1
c1 c2
c1 c2 c3
c1 c2 c3 c4
c1 c2 c3 c4 c5
here is the output:
$ while read -a cols; do echo ${cols[#]:1}; done < in.txt
c2
c2 c3
c2 c3 c4
c2 c3 c4 c5
This would work if you are using Bash and you could use as many 'x ' as elements you wish to discard and it ignores multiple spaces if they are not escaped.
while read x b; do echo "$b"; done < filename
Perl:
#m=`ls -ltr dir | grep ^d | awk '{print \$6,\$7,\$8,\$9}'`;
foreach $i (#m)
{
print "$i\n";
}
UPDATE :
if you wanna use no function calls at all while preserving the spaces and tabs in between the remaining fields, then do :
echo " 1 2 33 4444 555555 \t6666666 " |
{m,g}awk ++NF FS='^[ \t]*[^ \t]*[ \t]+|[ \t]+$' OFS=
=
2 33 4444 555555 6666666
===================
You can make it a lot more straight forward :
svn status | [m/g]awk '/!/*sub("^[^ \t]*[ \t]+",_)'
svn status | [n]awk '(/!/)*sub("^[^ \t]*[ \t]+",_)'
Automatically takes care of the grep earlier in the pipe, as well as trimming out extra FS after blanking out $1, with the added bonus of leaving rest of the original input untouched instead of having tabs overwritten with spaces (unless that's the desired effect)
If you're very certain $1 does not contain special characters that need regex escaping, then it's even easier :
mawk '/!/*sub($!_"[ \t]+",_)'
gawk -c/P/e '/!/*sub($!_"""[ \t]+",_)'
Or if you prefer customizing FS+OFS to handle it all :
mawk 'NF*=/!/' FS='^[^ \t]*[ \t]+' OFS='' # this version uses OFS
This should be a reasonably comprehensive awk-field-sub-string-extraction function that
returns substring of $0 based on input ranges, inclusive
clamp in out of range values,
handle variable length field SEPs
has speedup treatments for ::
completely no inputs, returning $0 directly
input values resulting in guaranteed empty string ("")
FROM-field == 1
FS = "" that has split $0 out by individual chars
(so the FROM <(_)> and TO <(__)> fields behave like cut -c rather than cut -f)
original $0 restored, w/o overwriting FS seps with OFS
|
{m,g}awk '{
2 print "\n|---BEFORE-------------------------\n"
3 ($0) "\n|----------------------------\n\n ["
4 fld2(2, 5) "]\n [" fld2(3) "]\n [" fld2(4, 2)
5 "]<----------------------------------------------should be
6 empty\n [" fld2(3, 11) "]<------------------------should be
7 capped by NF\n [" fld2() "]\n [" fld2((OFS=FS="")*($0=$0)+11,
8 23) "]<-------------------FS=\"\", split by chars
9 \n\n|---AFTER-------------------------\n" ($0)
10 "\n|----------------------------"
11 }
12 function fld2(_,__,___,____,_____)
13 {
if (+__==(_=-_<+_ ?+_:_<_) || (___=____="")==__ || !NF) {
return $_
16 } else if (NF<_ || (__=NF<+__?NF:+__)<(_=+_?_:!_)) {
return ___
18 } else if (___==FS || _==!___) {
19 return ___<FS \
? substr("",$!_=$!_ substr("",__=$!(NF=__)))__
20 : substr($(_<_),_,__)
21 }
22 _____=$+(____=___="\37\36\35\32\31\30\27\26\25"\
"\24\23\21\20\17\16\6\5\4\3\2\1")
23 NF=__
24 if ($(!_)~("["(___)"]")) {
25 gsub("..","\\&&",___) + gsub(".",___,____)
27 ___=____
28 }
29 __=(_) substr("",_+=_^=_<_)
30 while(___!="") {
31 if ($(!_)!~(____=substr(___,--_,++_))) {
32 ___=____
33 break }
35 ___=substr(___,_+_^(!_))
36 }
37 return \
substr("",($__=___ $__)==(__=substr($!_,
_+index($!_,___))),_*($!_=_____))(__)
}'
those <TAB> are actual \t \011 but relabeled for display clarity
|---BEFORE-------------------------
1 2 33 4444 555555 <TAB>6666666
|----------------------------
[2 33 4444 555555]
[33]
[]<---------------------------------------------- should be empty
[33 4444 555555 6666666]<------------------------ should be capped by NF
[ 1 2 33 4444 555555 <TAB>6666666 ]
[ 2 33 4444 555555 <TAB>66]<------------------- FS="", split by chars
|---AFTER-------------------------
1 2 33 4444 555555 <TAB>6666666
|----------------------------
I wasn't happy with any of the awk solutions presented here because I wanted to extract the first few columns and then print the rest, so I turned to perl instead. The following code extracts the first two columns, and displays the rest as is:
echo -e "a b c d\te\t\tf g" | \
perl -ne 'my #f = split /\s+/, $_, 3; printf "first: %s second: %s rest: %s", #f;'
The advantage compared to the perl solution from Chris Koknat is that really only the first n elements are split off from the input string; the rest of the string isn't split at all and therefor stays completely intact. My example demonstrates this with a mix of spaces and tabs.
To change the amount of columns that should be extracted, replace the 3 in the example with n+1.
ls -la | awk '{o=$1" "$3; for (i=5; i<=NF; i++) o=o" "$i; print o }'
from this answer is not bad but the natural spacing is gone.
Please then compare it to this one:
ls -la | cut -d\ -f4-
Then you'd see the difference.
Even ls -la | awk '{$1=$2=""; print}' which is based on the answer voted best thus far is not preserve the formatting.
Thus I would use the following, and it also allows explicit selective columns in the beginning:
ls -la | cut -d\ -f1,4-
Note that every space counts for columns too, so for instance in the below, columns 1 and 3 are empty, 2 is INFO and 4 is:
$ echo " INFO 2014-10-11 10:16:19 main " | cut -d\ -f1,3
$ echo " INFO 2014-10-11 10:16:19 main " | cut -d\ -f2,4
INFO 2014-10-11
$
If you want formatted text, chain your commands with echo and use $0 to print the last field.
Example:
for i in {8..11}; do
s1="$i"
s2="str$i"
s3="str with spaces $i"
echo -n "$s1 $s2" | awk '{printf "|%3d|%6s",$1,$2}'
echo -en "$s3" | awk '{printf "|%-19s|\n", $0}'
done
Prints:
| 8| str8|str with spaces 8 |
| 9| str9|str with spaces 9 |
| 10| str10|str with spaces 10 |
| 11| str11|str with spaces 11 |
The top-voted answer by zed_0xff did not work for me.
I have a log where after $5 with an IP address can be more text or no text. I need everything from the IP address to the end of the line should there be anything after $5. In my case, this is actually within an awk program, not an awk one-liner so awk must solve the problem. When I try to remove the first 4 fields using the solution proposed by zed_0xff:
echo " 7 27.10.16. Thu 11:57:18 37.244.182.218" | awk '{$1=$2=$3=$4=""; printf "[%s]\n", $0}'
it spits out wrong and useless response (I added [..] to demonstrate):
[ 37.244.182.218 one two three]
There are even some suggestions to combine substr with this wrong answer, but that only complicates things. It offers no improvement.
Instead, if columns are fixed width until the cut point and awk is needed, the correct answer is:
echo " 7 27.10.16. Thu 11:57:18 37.244.182.218" | awk '{printf "[%s]\n", substr($0,28)}'
which produces the desired output:
[37.244.182.218 one two three]

Resources