How to remove padding from awk command?

How to remove padding from awk command? - linux

I have a 10000 line file that contains on each line a string in the form of "data:key", which is also right-padded by 8 characters, where ':' is the delimiter. I am attempting to use awk from within Linux to print these pairs on their own lines, so that line #1 = data and line #2 = key, and I have achieved this using the command:
awk -F: '{print $1; print$2}' < ~/prices.txt
My problem occurs on the second line of each set. For some reason, it is padded with as much whitespace as there was from removing the data from the line. So, if my line was "26900:9976", the first line would be '26900' and the second line would be ' 9976', whitespace included.
If curious, I want to do it this way because I am piping the results to db_load to use within a B+-tree.

Not exactly your answer but you can use tr for this:
tr ':' '\n' < input
also I don't see the behaviour you are describing with your awk command, however, you can always add a sed to the pipeline to remove leading white space:
tr ':' '\n' < ~/prices.txt | sed 's/^[ \t]*//'
awk -F: '{print $1; print$2}' < ~/prices.txt | sed 's/^[ \t]*//'

You can use a regular expression as the field separator: a colon followed by zero or more whitespace chars will separate the fields.
awk -F ':[[:space:]]*' '{print $1; print $2}' < ~/prices.txt

Related

Joining consecutive lines using awk

How can i join consecutive lines into a single lines using awk? Actually i have this with my awk command:
awk -F "\"*;\"*" '{if (NR!=1) {print $2}}' file.csv
I remove the first line
44895436200043
38401951900014
72204547300054
38929771400013
32116464200027
50744963500014
i want to have this:
44895436200043 38401951900014 72204547300054 38929771400013 32116464200027 50744963500014
csv file

That's a job for tr:
# tail -n +2 prints the whole file from line 2 on
# tr '\n' ' ' translates newlines to spaces
tail -n +2 file | tr '\n' ' '
With awk, you can achieve this by changing the output record separator to " ":
# BEGIN{ORS= " "} sets the internal output record separator to a single space
# NR!=1 adds a condition to the default action (print)
awk 'BEGIN{ORS=" "} NR!=1' file

I assume you want to modify your existing awk, so that it prints a horizontal space separated list, instead of words, one per row.
You can replace the print $2 action in your command, you can do this:
awk -F "\"*;\"*" 'NR!=1{u=u s $2; s=" "} END {print u}' file.csv
or replace the ORS (output record separator)
awk -F "\"*;\"*" -v ORS=" " 'NR!=1{print $2}' file.csv
or pipe output to xargs:
awk -F "\"*;\"*" 'NR!=1{print $2}' file.csv | xargs

How to join every newline Strings within single or double quote

How to join every newline Strings within single or double quote separated by comma.
Example:
I have below names..
$ cat file
James kurt
Suji sane
Bhujji La
Loki Hapa
Desired:
"James kurt", "Suji sane", "Bhujji La", "Loki Hapa"
EDIT:
My Side Efforts:
Below which i have done but there i'm completing it in two steps, jst curious if it can be clubbed into one only.
$ awk '{print "\x22" $1" "$2 "\x22"}'| tr '\n' ','

First print all lines with the " and then join the lines with a comma:
< file xargs -d '\n' printf '"%s"\n' | paste -sd,
Instead of newline you could just remove trailing (or leading comma):
< file xargs -d '\n' printf '"%s",' | sed 's/,$//'
< file xargs -d '\n' printf ',"%s"' | cut -c2-
< file xargs -d '\n' printf ', "%s"' | cut -c3- # with space after comma
With sed add the " and hold the lines, then on last line replace newline with comma and remove the leading command and print:
sed -n 's/^/"/;s/$/"/;H;${x;s/\n/, /g;s/^, //;p}' file
You were close! The " " in your attempt adds a space between the line and ". You could:
awk '{print "\x22" $0 "\x22"}' | tr '\n' ',' |
# and then remove trailing comma:
sed 's/,$//'
But joining the lines with paste is just simpler then replacing newlines with comma and removing the last one:
awk '{print "\x22" $0 "\x22"}' | paste -sd,

Could you please try following.
awk -v lines=$(wc -l < Input_file) -v s1="\"" '
BEGIN{
OFS=", "
}
{
printf("%s%s",s1 $0 s1,lines==FNR?ORS:OFS)
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v lines=$(wc -l < Input_file) -v s1="\"" ' ##Starting awk program, creating variable lines which has total number of lines in Input_file and creating s1 variable with " in it.
BEGIN{ ##Starting BEGIN section of this program from here.
OFS=", " ##Setting OFS value as comma space here.
}
{
printf("%s%s",s1 $0 s1,lines==FNR?ORS:OFS) ##Printing current line and either printing space or new line as per condition.
}
' Input_file ##Mentioning Input_file name here.

awk '{printf "%s",(NR==1?"":",")"\042"$0"\042"}END{print ""}'
Note that the last END statement is only used to add the last new-line to the output. This makes it POSIX complaint.

This might work for you (GNU sed):
sed ':a;N;$!ba;s/.*/"&"/mg;s/\n/, /g' file
Slurp file into the pattern space, surround lines by double quotes and replace newlines by a comma and a space.
Alternative:
sed -z 's/\n$//;s/.*/"&"/mg;s/\n/, /g;s/$/\n/' file

search for a string and after getting result cut that word and store result in variable

I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp

REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )

Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"

Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)

Split or join lines in Linux using sed

I have file that contains below information
$ cat test.txt
Studentename:Ram
rollno:12
subjects:6
Highest:95
Lowest:65
Studentename:Krish
rollno:13
subjects:6
Highest:90
Lowest:45
Studentename:Sam
rollno:14
subjects:6
Highest:75
Lowest:65
I am trying place info of single student in single.
i.e My output should be
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65.
Below is the command I wrote
cat test.txt | tr "\n" " " | sed 's/Lowest:[0-9]\+/Lowest:[0:9]\n/g'
Above command is breaking line at regex Lowest:[0-9] but it doesn't print the pattern. Instead it is printing Lowest:[0-9].
Please help

Try:
$ sed '/^Studente/{:a; N; /Lowest/!ba; s/\n/ /g}' test.txt
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65
How it works
/^Studente/{...} tells sed to perform the commands inside the curly braces only on lines that start with Studente. Those commands are:
:a
This defines a label a.
N
This reads in the next line and appends it to the pattern space.
/Lowest/!ba
If the current pattern space does not contain Lowest, this tells sed to branch back to label a.
In more detail, /Lowest/ is true if the line contains Lowest. In sed, ! is negation so /Lowest/! is true if the line does not containLowest. Inba, thebstands for the branch command anda` is the label to branch to.
s/\n/ /g
This tells sed to replace all newlines with spaces.

Try this using awk :
awk '{if ($1 !~ /^Lowest/) {printf "%s ", $0} else {print}}' file.txt
Or shorter but more obfuscated :
awk '$1!~/^Lowest/{printf"%s ",$0;next}1' file.txt
Or correcting your command :
tr "\n" " " < file.txt | sed 's/Lowest:[0-9]\+/&\n/g'
Explanation: & is whats matched in the left part of substitution

Another possible GNU sed that doesn't assume Lowest is the last item:
sed ':a; N; /\nStudent/{P; D}; s/\n/ /; ba' test.txt

This might work for you (GNU sed):
sed '/^Studentename:/{:a;x;s/\n/ /gp;d};H;$ba;d' file
Use the hold space to gather up the fields and then remove the newlines to produce a record.

Extract substring of string if position is known

first, I need to extract the substring by a known position in the file.txt
file.txt in bash, but starting from the second line
>header
cgatgcgctctgtgcgtgcgtgcg
so let's assume I want position 10 from the second line, the output should be:
c
second, I want to include the surrounding ±5 characters, resulting in
gcgctctgtgc

{ read -r; read -r; echo "${REPLY:9:1}"; echo "${REPLY:4:11}"; } < file.txt
Output:
c
gcgctctgtgc
The ${parameter:offset:length} syntax for substrings is explained in https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion.
The read command is explained in https://www.gnu.org/software/bash/manual/bashref.html#index-read.
Input redirection: https://www.gnu.org/software/bash/manual/bashref.html#Redirections.

With awk:
To get the character at position 10, 1-indexed:
awk 'NR==2 {print substr($0, 10, 1)}'
NR==2 is checking if the record is second, if so the statements inside {} would be executed
substr($0, 10, 1) will extract 1 character starting from position 10 from field $0 (the whole record) i.e. only the 10-th character will be extracted. The format for substr() is substr(field, offset, length).
Similarly, to get ±5 characters around 10-th:
awk 'NR==2 {print substr($0, (10-5), 11)}'
(10-5) instead of 5 is just to give you the idea of the stuffs.
Example:
% cat file.txt
>header
cgatgcgctctgtgcgtgcgtgcg
% awk 'NR==2 {print substr($0, 10, 1)}' file.txt
c
% awk 'NR==2 {print substr($0, (10-5), 11)}' file.txt
gcgctctgtgc

use sed and cut:
sed -n '2p' file|cut -c 5-15
sed for access 2nd line and cut for print desired characters

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to remove padding from awk command? - linux

You can use a regular expression as the field separator: a colon followed by zero or more whitespace chars will separate the fields. awk -F ':[[:space:]]*' '{print $1; print $2}' < ~/prices.txt

Related

Joining consecutive lines using awk

How to join every newline Strings within single or double quote

search for a string and after getting result cut that word and store result in variable

Split or join lines in Linux using sed

Extract substring of string if position is known

Categories

Resources