How to grep for some specific parts, from a file? - text

I need to extract some specifiek parts from a 'very big > 3GB' text file.
,(1,'test#hotmail.com',0,0,1,1,0,0,1),
(2,'test4#hotmail.com',1,0,3,1,7,0,1),
(3,'test2#live.com',0,0,0,1,0,0,1),
(4,'test5#hotmail.com',1,0,7,1,1,1,3),
(5,'test3#hotmail.com',0,0,3,1,1,0,1),
(6,'test6#hotmail.com',1,0,5,1,6,1,1),
And I need 'first field, email, third field' so (without the '') and by line as below..
1,test#hotmail.com,0
2,test4#hotmail.com,1
3,test2#live.com,0
etc..
And if possible I want extract the domain names (like 1,test#hotmail.com,hotmail.com,0 )
I can extract the emails with the following:
grep -o -E '\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b' test
and I tried a lot more...
like egrep -o -E '([^),(^]+)' test, and set
I hope someone get help me out!

You may use tr to split the very long line to multi lines.
Then use tr to remove the special chars like ().
Finally, use AWK to print output the expected columns.
tr ")('" "\n " < file | tr -d "[ ]" |awk -F"," '{print $2","$3","$4}'
UPDATE
Then just split the email or hostname would solve your problem.
tr ")" "\n" < file | tr -d "[ (']" |awk -F"," '{ split($3, a, "#"); print $2","$3","a[2]","$4;}'
FINAL UPDATE
Add a check, only print the legal lines.
tr ")" "\n" < file | tr -d "[ (']" |awk -F"," '{ split($3, a, "#"); if (NF>2) {print $2","$3","a[2]","$4;}}'
OUTPUT
1,t#hotmail.com,hotmail.com,0
2,test4#hotmail.com,hotmail.com,1
3,test2#live.com,live.com,0

Related

Joining consecutive lines using awk

How can i join consecutive lines into a single lines using awk? Actually i have this with my awk command:
awk -F "\"*;\"*" '{if (NR!=1) {print $2}}' file.csv
I remove the first line
44895436200043
38401951900014
72204547300054
38929771400013
32116464200027
50744963500014
i want to have this:
44895436200043 38401951900014 72204547300054 38929771400013 32116464200027 50744963500014
csv file
That's a job for tr:
# tail -n +2 prints the whole file from line 2 on
# tr '\n' ' ' translates newlines to spaces
tail -n +2 file | tr '\n' ' '
With awk, you can achieve this by changing the output record separator to " ":
# BEGIN{ORS= " "} sets the internal output record separator to a single space
# NR!=1 adds a condition to the default action (print)
awk 'BEGIN{ORS=" "} NR!=1' file
I assume you want to modify your existing awk, so that it prints a horizontal space separated list, instead of words, one per row.
You can replace the print $2 action in your command, you can do this:
awk -F "\"*;\"*" 'NR!=1{u=u s $2; s=" "} END {print u}' file.csv
or replace the ORS (output record separator)
awk -F "\"*;\"*" -v ORS=" " 'NR!=1{print $2}' file.csv
or pipe output to xargs:
awk -F "\"*;\"*" 'NR!=1{print $2}' file.csv | xargs

How to join every newline Strings within single or double quote

How to join every newline Strings within single or double quote separated by comma.
Example:
I have below names..
$ cat file
James kurt
Suji sane
Bhujji La
Loki Hapa
Desired:
"James kurt", "Suji sane", "Bhujji La", "Loki Hapa"
EDIT:
My Side Efforts:
Below which i have done but there i'm completing it in two steps, jst curious if it can be clubbed into one only.
$ awk '{print "\x22" $1" "$2 "\x22"}'| tr '\n' ','
First print all lines with the " and then join the lines with a comma:
< file xargs -d '\n' printf '"%s"\n' | paste -sd,
Instead of newline you could just remove trailing (or leading comma):
< file xargs -d '\n' printf '"%s",' | sed 's/,$//'
< file xargs -d '\n' printf ',"%s"' | cut -c2-
< file xargs -d '\n' printf ', "%s"' | cut -c3- # with space after comma
With sed add the " and hold the lines, then on last line replace newline with comma and remove the leading command and print:
sed -n 's/^/"/;s/$/"/;H;${x;s/\n/, /g;s/^, //;p}' file
You were close! The " " in your attempt adds a space between the line and ". You could:
awk '{print "\x22" $0 "\x22"}' | tr '\n' ',' |
# and then remove trailing comma:
sed 's/,$//'
But joining the lines with paste is just simpler then replacing newlines with comma and removing the last one:
awk '{print "\x22" $0 "\x22"}' | paste -sd,
Could you please try following.
awk -v lines=$(wc -l < Input_file) -v s1="\"" '
BEGIN{
OFS=", "
}
{
printf("%s%s",s1 $0 s1,lines==FNR?ORS:OFS)
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v lines=$(wc -l < Input_file) -v s1="\"" ' ##Starting awk program, creating variable lines which has total number of lines in Input_file and creating s1 variable with " in it.
BEGIN{ ##Starting BEGIN section of this program from here.
OFS=", " ##Setting OFS value as comma space here.
}
{
printf("%s%s",s1 $0 s1,lines==FNR?ORS:OFS) ##Printing current line and either printing space or new line as per condition.
}
' Input_file ##Mentioning Input_file name here.
awk '{printf "%s",(NR==1?"":",")"\042"$0"\042"}END{print ""}'
Note that the last END statement is only used to add the last new-line to the output. This makes it POSIX complaint.
This might work for you (GNU sed):
sed ':a;N;$!ba;s/.*/"&"/mg;s/\n/, /g' file
Slurp file into the pattern space, surround lines by double quotes and replace newlines by a comma and a space.
Alternative:
sed -z 's/\n$//;s/.*/"&"/mg;s/\n/, /g;s/$/\n/' file

Using bash, I want to print a number followed by sizes of 2 paths on one line. i.e. output of 3 commands on one line

Using bash, I want to print a number followed by sizes of 2 paths on one line. i.e. output of 3 commands on one line.
All the 3 items should be separated by ":"
echo -n "10001:"; du -sch /abc/def/* | grep 'total' | awk '{ print $1 }'; du -sch /ghi/jkl/* | grep 'total' | awk '{ print $1 }'
I am getting the output as -
10001:61M
:101M
But I want the output as -
10001:61M:101M
This should work for you. The two key elements added being the
tr - d '\n'
which effectively strips new line characters from the end of the output. As well as adding in the echo ":" to get the extra colon for formatting in there.
Hope this helps! Here's a link to the docs for tr command.
https://ss64.com/bash/tr.html
echo -n "10001:"; du -sch /abc/def/* | grep 'total' | awk '{ print $1 }' | tr -d '\n'; echo ":" | tr -d '\n'; du -sch /ghi/jkl/* | grep 'total' | awk '{ print $1 }'
Save your values to variables, and then use printf:
printf '%s:%s:%s\n' "$first" "$second" "$third"

Replace comma with space in shell script

Replace comma with space using a shell script
Given the following input:
Test,10.10.10.10,"80,22,3306",connect
I need to get below output using a bash script
Test 10.10.10.10 "80,22,3306" connect
If you have gawk, you can use FPAT (field pattern), setting it to a regular expression.
awk -v FPAT='([^,]+)|(\"[^"]+\")' '{ for(i=1;i<=NF;i++) { printf "%s ",$i } }' <<< "Test,10.10.10.10,\"80,22,3306\",connect"
We set FPAT to separate the text based on anything that is not a comma and also data enclosed in quotation marks as as well as anything that is not a quotation mark. We then print all the fields with a spaces in between.
Considering if your Input_file is same as shown sample then following sed may help you in same too.
sed 's/\(.[^,]*\),\([^,]*\),\(".*"\),\(.*\)/\1 \2 \3 \4/g' Input_file
Assuming you can read your input from the file, this works
#!/usr/bin/bash
while read -r line;do
declare -a begin=$(echo $line | awk -F'"' '{print $1}' | tr "," " " )
declare -a end=$(echo $line |awk -F'"' '{print $3}' | tr "," " " )
declare -a middle=$(echo $line | awk -F'"' '{print $2}' )
echo "${begin[#]} \"${middle[#]}\" ${end[#]}"
done < connect_file
Edit: I see,that you want to keep the commas between port numbers. I have edited the script.
echo Test,10.10.10.10,\"80,22,3306\",connect|awk '{sub(/,/," ")gsub(/,"80,22,3306",/," \4280,22,3306\42 ")}1'
Test 10.10.10.10 "80,22,3306" connect

bash, extract string from text file with space delimiter

I have a text files with a line like this in them:
MC exp. sig-250-0 events & $0.98 \pm 0.15$ & $3.57 \pm 0.23$ \\
sig-250-0 is something that can change from file to file (but I always know what it is for each file). There are lines before and above this, but the string "MC exp. sig-250-0 events" is unique in the file.
For a particular file, is there a good way to extract the second number 3.57 in the above example using bash?
use awk for this:
awk '/MC exp. sig-250-0/ {print $10}' your.txt
Note that this will print: $3.57 - with the leading $, if you don't like this, pipe the output to tr:
awk '/MC exp. sig-250-0/ {print $10}' your.txt | tr -d '$'
In comments you wrote that you need to call it in a script like this:
while read p ; do
echo $p,awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$'
done < grid.txt
Note that you need a sub shell $() for the awk pipe. Like this:
echo "$p",$(awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$')
If you want to pass a shell variable to the awk pattern use the following syntax:
awk -v p="MC exp. sig-$p" '/p/ {print $10}' a.txt | tr -d '$'
More lines would've been nice but I guess you would like to have a simple use awk.
awk '{print $N}' $file
If you don't tell awk what kind of field-separator it has to use it will use just a space ' '. Now you just have to count how many fields you have got to get your field you want to get. In your case it would be 10.
awk '{print $10}' file.txt
$3.57
Don't want the $?
Pipe your awk result to cut:
awk '{print $10}' foo | cut -d $ -f2
-d will use the $ als field-separator and -f will select the second field.
If you know you always have the same number of fields, then
#!/bin/bash
file=$1
key=$2
while read -ra f; do
if [[ "${f[0]} ${f[1]} ${f[2]} ${f[3]}" == "MC exp. $key events" ]]; then
echo ${f[9]}
fi
done < "$file"

Resources