Remove lines containing space in unix - linux

Below is my comma separated input.txt file, i want to read the columns and write the lines in to the output.txt when any 1 column has a space.
Content of input.txt:
1,Hello,world
2,worl d,hell o
3,h e l l o, world
4,Hello_Hello,World#c#
5,Hello,W orld
Content of output.txt:
1,Hello,world
4,Hello_Hello,World#c#
is't possible to achieve using awk? Please help!

A simple way to filter out lines with spaces is using inverted matching with grep:
grep -v ' ' input.txt
If you must use awk:
awk '!/ /' input.txt
Or perl:
perl -ne '/ / || print' input.txt
Or pure bash:
while read line; do [[ $line == *' '* ]] || echo $line; done < input.txt
# or
while read line; do [[ $line =~ ' ' ]] || echo $line; done < input.txt
UPDATE
To check if let's say field 2 contains space, you could use awk like this:
awk -F, '$2 !~ / /' input.txt
To check if let's say field 2 OR field 3 contains space:
awk -F, '!($2 ~ / / || $3 ~ / /)' input.txt
For your follow-up question in comments
To do the same using sed, I only know these awkward solutions:
# remove lines if 2nd field contains space
sed -e '/^[^,]*,[^,]* /d' input.txt
# remove lines if 2nd or 3rd field contains space
sed -e '/^[^,]*,[^,]* /d' -e '/^[^,]*,[^,]*,[^,]* /d' input.txt
For your 2nd follow-up question in comments
To disregard leading spaces in the 2nd or 3rd fields:
awk -F', *' '!($2 ~ / / || $3 ~ / /)' input.txt
# or perhaps what you really want is this:
awk -F', *' -v OFS=, '!($2 ~ / / || $3 ~ / /) { print $1, $2, $3 }' input.txt

This can also be done easily with sed
sed '/ /d' input.txt

try this one-liner
awk 'NF==1' file
as #jwpat7 pointed out, it won't give correct output if the line has only leading space, then this line, with regex should do, but it has been already posted in janos's answer.
awk '!/ /' file
or
awk -F' *' 'NF==1'

Pure bash for the fun of it...
#!/bin/bash
while read line
do
if [[ ! $line =~ " " ]]
then
echo $line
fi
done < input.txt

columnWithSpace=2
ColumnBef=$(( ${columnWithSpace} - 1 ))
sed '/\([^,]*,\)\{${ColumnBef\}[^ ,]* [^,]*,/ d'
if you know the column directly (by example the 3):
sed '/\([^,]*,\)\{2}[^ ,]* [^,]*,/ d'

If you can trust the input to always have no more than three fields, simply finding a space somewhere after a comma is sufficient.
grep ',.* ' input.txt
If there can be (or usually are) more fields, you can pull that off with grep -E and a suitable ERE, but you are fast approaching the point at which the equivalent Awk solution will be more readable and maintainable.

Related

Extract field after colon for lines where field before colon matches pattern

I have a file file1 which looks as below:
tool1v1:1.4.4
tool1v2:1.5.3
tool2v1:1.5.2.c8.5.2.r1981122221118
tool2v2:32.5.0.abc.r20123433554
I want to extract value of tool2v1 and tool2v2
My output should be 1.5.2.c8.5.2.r1981122221118 and 32.5.0.abc.r20123433554.
I have written the following awk but it is not giving correct result:
awk -F: '/^tool2v1/ {print $2}' file1
awk -F: '/^tool2v2/ {print $2}' file1
grep -E can also do the job:
grep -E "tool2v[12]" file1 |sed 's/^.*://'
If you have a grep that supports Perl compatible regular expressions such as GNU grep, you can use a variable-sized look-behind:
$ grep -Po '^tool2v[12]:\K.*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
The -o option is to retain just the match instead of the whole matching line; \K is the same as "the line must match the things to the left, but don't include them in the match".
You could also use a normal look-behind:
$ grep -Po '(?<=^tool2v[12]:).*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
And finally, to fix your awk which was almost correct (and as pointed out in a comment):
$ awk -F: '/^tool2v[12]/ { print $2 }' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
You can filter with grep:
grep '\(tool2v1\|tool2v2\)'
And then remove the part before the : with sed:
sed 's/^.*://'
This sed operation means:
^ - match from beginning of string
.* - all characters
up to and including the :
... and replace this matched content with nothing.
The format is sed 's/<MATCH>/<REPLACE>/'
Whole command:
grep '\(tool2v1\|tool2v2\)' file1|sed 's/^.*://'
Result:
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
the question has already been answered though, but you can also use pure bash to achieve the desired result
#!/usr/bin/env bash
while read line;do
if [[ "$line" =~ ^tool2v* ]];then
echo "${line#*:}"
fi
done < ./file1.txt
the while loop reads every line of the file.txt, =~ does a regexp match to check if the value of $line variable if it starts with toolv2, then it trims : backward

How to use uniq after printf

I have lot of file which I need to concatenate together with same prefix. I have an idea, but I do not know how to solve this problem:
files:
NAME1_C001_xxx.tsv
NAME1_C001_yyy.tsv
NAME2_C001_xxx.tsv
NAME2_C001_yyy.tsv
I want to print just uniq prefix - NAME1 and NAME2. Length of string in prefix and suffix is vary, but always before prefix is _C001
my solution is:
fo i in *.tsv
do prexix=$(printf "%s\n" "${i%_C001*}")
cat $prefix_C001_xxx.tsv $prefix_C001_yyy.tsv > ${i%_C001*}.merged.tsv
done;
But this solution is not very good. I have each prefix twice.
Thank you for any help.
EDITED:
One solution thanks to anubhava:
fo i in $(printf "%s\n" *.tsv | awk -F '_C001' '!seen[$1]++{print $1}')
do
cat $prefix_C001_xxx.tsv $prefix_C001_yyy.tsv > ${i%_C001*}.merged.tsv
done;
You don't need printf at all here; it's just an unnecessary wrapper around the parameter substitution you are already using.
for i in *.tsv
do prefix=${i%_C001*}
[[ -f $prefix.merged.tsv ]] && continue # Avoid doing the same prefix twice
cat "${prefix}"_* > "$prefix.merged.tsv"
done
As your filenames don't contain any newline you can pipe your list to a awk command to print unique prefixes using field separator as _C001:
printf "%s\n" *.tsv | awk -F '_C001' '!seen[$1]++{print $1}'
NAME1
NAME2
You can also use _ as FS in awk:
printf "%s\n" *.tsv | awk -F _ '!seen[$1]++{print $1}'

Extract part of a path

I have a variable that is a path of a windows folder.
I would like to handle the way with SED .
Example:
Input:
\\computer1\folder$
Output:
computer1
I would always pick the host name that is between \\ and \
Could someone give me a light?
You can do this in a POSIX compatible shell:
% folder='\\computer1\folder$'
% folder="${folder/\\\\/}" # Remove leading '\\'
% printf "%s\n" "${folder%%\\*}"
computer1
Alternative with Bashism:
% folder='\\computer1\folder$'
% [[ "$folder" =~ '\\'([^\\]*) ]]
% printf "%s\n" "${BASH_REMATCH[1]}"
computer1
With sed :
$ sed 's/\\\\\([^\]*\)\\.*/\1/' <<< '\\computer1\folder$'
computer1
The basic syntax for sed substitution command is s/oldtext/newtext/.
s is for substitution command
in the path string, every \ must be escaped so it becomes \\
\([^\]*\)\ captures every non \ character until next \
the captured string is output with backreference \1
For this scenario, awk is a better option. Use awk -F'\' '{print $2}'
Example
$> echo "\\computer1\folder$"|awk -F'\' '{print $2}'
Output
computer1
Or you can try putting it with a variable.
$> export val="\\computer1\folder$"
$> echo $val|awk -F'\' '{print $2}'
computer1
With awk. s stands for "separator" and a stands for "any".
echo '\\computer1\folder$' | \
awk '{s="\\\\"; a=".*"; sub(a s s, ""); sub(s a, ""); print}'

linux|awk|shell script block deletion

My input file has blocks like below. Please help me deleting the block and its contents using awk or sed
[abc]
para1=123
para2=456
para3=111
[pqr]
para1=333
para2=765
para3=1345
[xyz]
para1=888
para2=236
para3=964
now how do i delete a block and its parameters completely .Please help me achieve this with awk command.Thanks in advance
You can use RS for split blocks, (NOTE: NR>1 because awk generate a empty block in beginning)
awk -vRS='[' -v remove="pqr" '
NR>1 && $0 !~ "^"remove"]" {printf "%s", "["$0; }
' file
you get,
[abc]
para1=123
para2=456
para3=111
[xyz]
para1=888
para2=236
para3=964
Depends on how you want to filter. If you want to delete the block with the header '[pqr]'
awk '!/^\[pqr\]/' RS= ORS='\n\n' input
or
awk '$1 !~ "[pqr]"' RS= ORS='\n\n' input
If you want to omit the 2nd record (the same as above)
awk 'NR!=2' RS= ORS='\n\n' input
If you want to omit the record in which para2=765,
awk '$3 !~ "765"' RS= ORS='\n\n' input
Perl solution to remove block [abc]
perl -lne 'BEGIN{$/=""} print "$_\n" unless /^\[abc\]/' file
-n loop around every line of the input file, put the line in the $_ variable, do not automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
$/ is the input record separator. Setting it to "" in a BEGIN{} block puts Perl into paragraph mode.
$_ is the current line.
/^/ is a regular expression which begins with the search term
output:
[pqr]
para1=333
para2=765
para3=1345
[xyz]
para1=888
para2=236
para3=964
This variation enables argument parsing with -s and passes [abc] to variable $b
perl -slne 'BEGIN{$/=""} print "$_\n" unless /^$b/' -- -b='\[abc\]'
I propose a slightly different solution using only shell.
#!/bin/sh
# specify the block to withhold
WITHHOLD=2
COUNT=1
INAWHITESP=0
while read i
do if [ -z "$i" -a "$INAWHITESP" -eq 0 ]
then COUNT=$(( COUNT + 1 ))
INAWHITESP=1
fi
if [ -n "$i" -a "$INAWHITESP" -eq 1 ]
then INAWHITESP=0
fi
if [ "$COUNT" -ne "$WITHHOLD" ]
then printf "%s\n" "$i"
fi
done < inputfile > outputfile
To remove block abc
awk 'BEGIN{RS=""} !/\[abc\]/'

bash, extract string from text file with space delimiter

I have a text files with a line like this in them:
MC exp. sig-250-0 events & $0.98 \pm 0.15$ & $3.57 \pm 0.23$ \\
sig-250-0 is something that can change from file to file (but I always know what it is for each file). There are lines before and above this, but the string "MC exp. sig-250-0 events" is unique in the file.
For a particular file, is there a good way to extract the second number 3.57 in the above example using bash?
use awk for this:
awk '/MC exp. sig-250-0/ {print $10}' your.txt
Note that this will print: $3.57 - with the leading $, if you don't like this, pipe the output to tr:
awk '/MC exp. sig-250-0/ {print $10}' your.txt | tr -d '$'
In comments you wrote that you need to call it in a script like this:
while read p ; do
echo $p,awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$'
done < grid.txt
Note that you need a sub shell $() for the awk pipe. Like this:
echo "$p",$(awk '/MC exp. sig-$p/ {print $10}' filename | tr -d '$')
If you want to pass a shell variable to the awk pattern use the following syntax:
awk -v p="MC exp. sig-$p" '/p/ {print $10}' a.txt | tr -d '$'
More lines would've been nice but I guess you would like to have a simple use awk.
awk '{print $N}' $file
If you don't tell awk what kind of field-separator it has to use it will use just a space ' '. Now you just have to count how many fields you have got to get your field you want to get. In your case it would be 10.
awk '{print $10}' file.txt
$3.57
Don't want the $?
Pipe your awk result to cut:
awk '{print $10}' foo | cut -d $ -f2
-d will use the $ als field-separator and -f will select the second field.
If you know you always have the same number of fields, then
#!/bin/bash
file=$1
key=$2
while read -ra f; do
if [[ "${f[0]} ${f[1]} ${f[2]} ${f[3]}" == "MC exp. $key events" ]]; then
echo ${f[9]}
fi
done < "$file"

Resources