awk - how to delete first column with field separator - linux

I have a csv file with data presented as follows
87540221|1356438283301|1356438284971|1356438292151697
87540258|1356438283301|1356438284971|1356438292151697
87549647|1356438283301|1356438284971|1356438292151697
I'm trying to save the first column to a new file (without field separator , and then delete the first column from the main csv file along with the first field separator.
Any ideas?
This is what I have tried so far
awk 'BEGIN{FS=OFS="|"}{$1="";sub("|,"")}1'
but it doesn't work

This is simple with cut:
$ cut -d'|' -f1 infile
87540221
87540258
87549647
$ cut -d'|' -f2- infile
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
Just redirect into the file you want:
$ cut -d'|' -f1 infile > outfile1
$ cut -d'|' -f2- infile > outfile2 && mv outfile2 file

Assuming your original CSV file is named "orig.csv":
awk -F'|' '{print $1 > "newfile"; sub(/^[^|]+\|/,"")}1' orig.csv > tmp && mv tmp orig.csv

GNU awk
awk '{$1="";$0=$0;$1=$1}1' FPAT='[^|]+' OFS='|'
Output
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697

Pipe is special regex symbol and sub function expectes you to pass a regex. Correct awk command should be this:
awk 'BEGIN {FS=OFS="|"} {$1=""; sub(/\|/, "")}'1 file
OUTPUT:
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697

With sed :
sed 's/[^|]*|//' file.txt

Related

Using awk to separate an output containing a tab and a "/" separators into a delimited format

I'll appreciate help in converting this output to a pipe delimited
I have the following output
abcde1234 /path/A/file1
test23455 /path/B/file2345
But I would like in
abcde1234|file1
test23455|file2345
In awk, If you set FS as [[:blank:]]+/|/ you can print the first and last fields:
awk -v FS='[[:blank:]]+/|/' -v OFS='|' '{print $1, $NF}' file
abcde1234|file1
test23455|file2345
Here is a one-liner awk solution:
awk -v FS='[ \t].*/' -v OFS='|' '{$1=$1}1' file
and, a sed one-liner:
sed 's%[[:blank:]].*/%|%' file
and a pure bash one
while read -r; do echo "${REPLY%%[[:blank:]]*}|${REPLY##*/}"; done < file
try to use cut 🤷🏻‍♀️.
abcde1234 /path/A/file1
test23455 /path/B/file2345
while IFS= read -r line; do
value1=$(echo $line | cut -d ' ' -f1)
value2=$(echo $line | cut -d '/' -f4)
printf "$value1 $value2\n"
done < <(cat list)

can i substring a field after i parsed a file with delimiter in linux

for example
here is the data
45|019|6|113|201901101522|40000539306|
45|015|1|6|201901101045|40000707530|
45|018|6|201|201901101733|40002235304|
45|009|8|377|201901101732|40002097431|
and i want to see like this
2019011
2019011
2019011
2019011
2019011
this is what i have already tried : cut -d'|' -f5 < input.txt
these what i get
201901101522
201901101045
201901101733
201901101732
You can pipe it into cut again:
cut -d'|' -f5 <input.txt | cut -c1-7
Or use awk:
awk -F'\|' '{print substr($5, 1, 7)}' input.txt
I'm jumping to conclusions, but I assume you want a complete date, not just the "10"s of the day?
gawk -F'|' '{print gensub( /^(.{8}).*/, "\\1", "1",$5)}' data
20190110
20190110
20190110
20190110

replace sed command text inline

I have this file
file.txt
unknown#mail.com||unknown#mail.com||
unknown#mail2.com||unknown#mail2.com||
unknown#mail3.com||unknown#mail3.com||
unknown#mail4.com||unknown#mail4.com||
unknownpass
unknownpass2
unknownpass3
unknownpass4
How can I use the sed command to obtain this:
unknown#mail.com|unknownpass|unknown#mail.com|unknownpass|
unknown#mail2.com|unknownpass2|unknown#mail2.com|unknownpass2|
unknown#mail3.com|unknownpass3|unknown#mail3.com|unknownpass3|
unknown#mail4.com|unknownpass4|unknown#mail4.com|unknownpass4|
This might work for you (GNU sed):
sed ':a;N;/\n[^|\n]*$/!ba;s/||\([^|]*\)||\(\n.*\)*\n\(.*\)$/|\3|\1|\3|\2/;P;D' file
Slurp the first part of the file into pattern space and one of the replacements, substitute, print and delete the first line and then repeat.
Well, this does use sed anyway:
{ sed -n 5,\$p file.txt; sed 4q file.txt; } | awk 'NR<5{a[NR]=$0; next}
{$2=a[NR-4]; $4=a[NR-4]} 1' FS=\| OFS=\|
awk to the rescue!
awk 'BEGIN {FS=OFS="|"}
NR==FNR {if(NF==1) a[++c]=$1; next}
NF>4 {$2=a[FNR]; $4=$2; print}' file{,}
a two pass algorithm, caches the entries in the first round and inserts them into the empty fields, assumes the number of items match.
Here is another approach with one pass, powered by tac wrapped awk
tac file |
awk 'BEGIN {FS=OFS="|"}
NF==1 {a[++c]=$1}
NF>4 {$2=a[c--]; $4=$2; print}' |
tac
I would combine the related lines with paste and reshuffle the elements with awk (I assume the related lines are exactly half a file away):
n=$(wc -l < file.txt)
paste -d'|' <(head -n $((n/2)) file.txt) <(tail -n $((n/2)) file.txt) |
awk '{ print $1, $6, $3, $6, "" }' FS='|' OFS='|'
Output:
unknown#mail.com|unknownpass|unknown#mail.com|unknownpass|
unknown#mail2.com|unknownpass2|unknown#mail2.com|unknownpass2|
unknown#mail3.com|unknownpass3|unknown#mail3.com|unknownpass3|
unknown#mail4.com|unknownpass4|unknown#mail4.com|unknownpass4|

how to print tail of path filename using awk

I've searched it with no success.
I have a file with pathes.
I want to print the tail of a all pathes.
for example (for every line in file):
/homes/work/abc.txt
--> abc.txt
Does anyone know how to do it?
Thanks
awk -F "/" '{print $NF}' input.txt
will give output of:
abc1.txt
abc2.txt
abc3.txt
for:
$>cat input.txt
text path/to/file/abc1.txt
path/to/file/abc2.txt
path/to/file/abc3.txt
How about this awk
echo "/homes/work/abc.txt" | awk '{sub(/.*\//,x)}1'
abc.txt
Since .* is greedy, it will continue until last /
So here we remove all until last / with x, and since x is empty, gives nothing.
Thors version
echo "/homes/work/abc.txt" | awk -F/ '$0=$NF'
abc.txt
NB this will fail for /homes/work/0 or 0,0 etc so better use:
echo "/homes/work/abc.txt" | awk -F/ '{$0=$NF}1'
awk solutions are already provided by #Jotne and #bashophil
Here are some other variations (just for fun)
Using sed
sed 's:.*/::' file
Using grep
grep -oP '(.*/)?\K.*' file
Using cut - added by #Thor
rev file | cut -d/ -f1 | rev
Using basename - suggested by #fedorqui and #EdMorton
while IFS= read -r line; do
basename "$line"
done < file

how to get requred field from file on linux?

I have one file which contains three fields separated by two spaces. I need to get only third field from file. File content is as in following example:
kuldeep Mirat Shakti
balaji salunke pune
.
.
.
How can I get the third field?
To get the 3rd field, assuming you don't have any "embedded spaces", just
awk '{print $3}' file
awk by default sets whitespaces as field delimiters. So even if you have 2 spaces or more, the 3rd field is always $3.
However, if you want to be specific, then specify a Field delimiter
awk -F" " '{print $3}' file
If you have other choices, a Ruby one
ruby -F" " -ane 'print $F[2]' file
ruby -ane 'print $F[2]' file
Update: If you need to get all fields after 3rd,
awk -F" " '{$1=$2=$3=""}1' OFS=" " file # add a pipe to `sed 's/^[ \t]*//'` if desired
ruby -F" " -ane 'puts $F[3..-1].join(" ")' file
Use awk:
awk -F' ' '{print $3}' file
This also works if fields may contain embedded spaces.
To get the third field of each line, pipe through awk, e.g
cat filename | awk '{print $3}'
If you just want to get the third field of the first line, use head, too:
cat filename | head -n 1 | awk '{print $3}'
Given #balaji's comment to #kurani's answer:
perl -pe 's/^.*? .*? //' filename
awk -F' ' '{for(i=3; i<NF; i++) {printf("%s%s",$i,FS)}; print $NF}' filename
less filename | cut -d" " -f 3

Resources