How to use Linux command(sed?) to delete specific lines in a file? - linux

I have a file that contains a matrix. For example, I have:
1 a 2 b
2 b 5 b
3 d 4 b
4 b 7 b
I know it's easy to use sed command to delete specific lines with specific strings. But what if I only want to delete those lines where the second field's value is b (i.e., second line and fourth line)?

You can use regex in sed.
sed -i 's/^[0-9]\s+b.*//g' xxx_file
or
sed -i '/^[0-9]\s+b.*/d' xxx_file
The "-i" argument will modify the file's content directly, you can remove "-i" and output the result to other files as you want.

Awk just work fine, just use code as below:
awk '{if ($2 != "b") print $0;}' file
if you want get more usage about awk, just man it!

awk:
cat yourfile.txt | awk '{if($2!="b"){print;}}'

Related

How can I use grep and regular expression to display names with just 3 characters

I am new to grep and UNIX. I have a sample of data and want to display all the first names that only contain three characters e.g. Lee_example. but I having some difficulty doing that. I am currently using this code cat file.txt|grep -E "[A-Z][a-z]{2}" but it is displaying all the names that contain at least 3 characters and not only 3 characters
Sample data
name
number
Lee_example
1
Hector_exaple
2
You need to match the _ after the first name.
grep -E "[A-Z][a-z]{2}_"
With awk:
awk -F_ 'length($1)==3{print $1}'
-F_ tells awk to split the input lines by _. length($1) == 3 checks whether the first fields (the name) is 3 characters long and {print $1} prints the name in that case.

Linux split a file in two columns

I have the following file that contains 2 columns :
A:B:IP:80 apples
C:D:IP2:82 oranges
E:F:IP3:84 grapes
How is possible to split the file in 2 other files, each column in a file like this:
File1
A:B:IP:80
C:D:IP2:82
E:F:IP3:84
File2
apples
oranges
grapes
Try:
awk '{print $1>"file1"; print $2>"file2"}' file
After runningl that command, we can verify that the desired files have been created:
$ cat file1
A:B:IP:80
C:D:IP2:82
E:F:IP3:84
And:
$ cat file2
apples
oranges
grapes
How it works
print $1>"file1"
This tells awk to write the first column to file1.
print $2>"file2"
This tells awk to write the second column to file2.
Perl 1-liner using (abusing) the fact that print goes to STDOUT, i.e. file descriptor 1, and warn goes to STDERR, i.e. file descriptor 2:
# perl -n means loop over the lines of input automatically
# perl -e means execute the following code
# chomp means remove the trailing newline from the expression
perl -ne 'chomp(my #cols = split /\s+/); # Split each line on whitespace
print $cols[0] . "\n";
warn $cols[1] . "\n"' <input 1>col1 2>col2
You could, of course, just use cut -b with the appropriate columns, but then you would need to read the file twice.
Here's an awk solution that'll work with any number of columns:
awk '{for(n=1;n<=NF;n++)print $n>"File"n}' input.txt
This steps through each field on the line and prints the field to a different output file based on the column number.
Note that blank fields -- or rather, lines with fewer fields than other lines, will cause line numbers to mismatch. That is, if your input is:
A 1
B
C 3
Then File2 will contain:
1
3
If this is a concern, mention it in an update to your question.
You could of course do this in bash alone, in a number of ways. Here's one:
while read -r line; do
a=($line)
for m in "${!a[#]}"; do
printf '%s\n' "${a[$m]}" >> File$((m+1))
done
done < input.txt
This reads each line of input into $line, then word-splits $line into values in the $a[] array. It then steps through that array, printing each item to the appropriate file, named for the index of the array (plus one, since bash arrays start at zero).

Swapping the first word with itself 3 times only if there are 4 words only using sed

Hi I'm trying to solve a problem only using sed commands and without using pipeline. But I am allowed to pass the result of a sed command to a file or te read from a file.
EX:
sed s/dog/cat/ >| tmp
or
sed s/dog/cat/ < tmp
Anyway lets say I had a file F1 and its contents was :
Hello hi 123
if a equals b
you
one abc two three four
dany uri four 123
The output should be:
if if if a equals b
dany dany dany uri four 123
Explanation: the program must only print lines that have exactly 4 words and when it prints them it must print the first word of the line 3 times.
I've tried doing commands like this:
sed '/[^ ]*.[^ ]*.[^ ]*/s/[^ ]\+/& & &/' F1
or
sed 's/[^ ]\+/& & &/' F1
but I can't figure out how i can calculate with sed that there are only 4 words in a line.
any help will be appreciated
$ sed -En 's/^([^[:space:]]+)([[:space:]]+[^[:space:]]+){3}$/\1 \1 &/p' file
if if if a equals b
dany dany dany uri four 123
The above uses a sed that supports EREs with a -E option, e.g. GNU and OSX seds).
If the fields are tab separated
sed 'h;s/[^[:blank:]]//g;s/[[:blank:]]\{3\}//;/^$/!d;x;s/\([^[:blank:]]*[[:blank:]]\)/\1\1\1/' infile

How to use grep or awk to process a specific column ( with keywords from text file )

I've tried many combinations of grep and awk commands to process text from file.
This is a list of customers of this type:
John,Mills,81,Crescent,New York,NY,john#mills.com,19/02/1954
I am trying to separate these records into two categories, MEN and FEMALES.
I have a list of some 5000 Female Names , all in plain text , all in one file.
How can I "grep" the first column ( since I am only matching first names) but still printing the entire customer record ?
I found it easy to "cut" the first column and grep --file=female.names.txt, but this way it's not going to print the entire record any longer.
I am aware of the awk option but in that case I don't know how to read the female names from file.
awk -F ',' ' { if($1==" ???Filename??? ") print $0} '
Many thanks !
You can do this with Awk:
awk -F, 'NR==FNR{a[$0]; next} ($1 in a)' female.names.txt file.csv
Would print the lines of your csv file that contain first names of any found in your file female.names.txt.
awk -F, 'NR==FNR{a[$0]; next} !($1 in a)' female.names.txt file.csv
Would output lines not found in female.names.txt.
This assumes the format of your female.names.txt file is something like:
Heather
Irene
Jane
Try this:
grep --file=<(sed 's/.*/^&,/' female.names.txt) datafile.csv
This changes all the names in the list of female names to the regular expression ^name, so it only matches at the beginning of the line and followed by a comma. Then it uses process substitution to use that as the file to match against the data file.
Another alternative is Perl, which can be useful if you're not super-familiar with awk.
#!/usr/bin/perl -anF,
use strict;
our %names;
BEGIN {
while (<ARGV>) {
chomp;
$names{$_} = 1;
}
}
print if $names{$F[0]};
To run (assume you named this file filter.pl):
perl filter.pl female.names.txt < records.txt
So, I've come up with the following:
Suppose, you have a file having the following lines in a file named test.txt:
abe 123 bdb 532
xyz 593 iau 591
Now you want to find the lines which include the first field having the first and last letters as vowels. If you did a simple grep you would get both of the lines but the following will give you the first line only which is the desired output:
egrep "^([0-z]{1,} ){0}[aeiou][0-z]+[aeiou]" test.txt
Then you want to the find the lines which include the third field having the first and last letters as vowels. Similary, if you did a simple grep you would get both of the lines but the following will give you the second line only which is the desired output:
egrep "^([0-z]{1,} ){2}[aeiou][0-z]+[aeiou]" test.txt
The value in the first curly braces {1,} specifies that the preceding character which ranges from 0 to z according to the ASCII table, can occur any number of times. After that, we have the field separator space in this case. Change the value within the second curly braces {0} or {2} to the desired field number-1. Then, use a regular expression to mention your criteria.

CSV grep but keep the header

I have a CSV file that look like this:
A,B,C
1,2,3
4,4,4
1,2,6
3,6,9
Is there an easy way to grep all the rows in which the B column is 2, and keep the header? For example, I want the output be like
A,B,C
1,2,3
1,2,6
I am working under linux
Using awk:
awk -F, 'NR==1 || $2==2' file
NR==1 -> if first line,
$2==2 -> if second column is equal to 2. Lines are printed if either of the above is true.
To choose the column using the header column name:
awk -F, -v col="B" 'NR==1{for(i=1;i<=NF;i++)if($i==col)break;print;next}$i==2' file
Replace B with the appropriate name of the column which you want to check against.
You can use addresses in sed:
sed -n '1p;/^[^,]*,2/p'
It means:
1p Print the first line.
/ Start a match.
^ Match the beginnning of a line.
[^,] Match anything but a comma
* zero or more times.
, Match a comma.
2 Match a 2.
/p End of match, if it matches, print.
If the header can contain the value you are looking for, you should be more careful:
sed -n '1p;1!{/^[^,]*,2/p}'
1!{ ... } just means "Do the following for lines other then the first one".
For column number n>2, you can add a quantifier:
sed -n '1p;1!{/^\([^,]*,\)\{M\}2/p}'
where M=n-1. The quantifier just means repetition, so the non-comma-0-or-more-times-comma thing is repeated M times.
For true CSV files where a value can contain a comma, switch to Perl and Text::CSV.
$ awk -F, 'NR==1 { for (i=1;i<=NF;i++) h[$i] = i; print; next } $h["B"] == 2' file
A,B,C
1,2,3
1,2,6
By the way, sed is an excellent tool for simple substitutions on a single line, for anything else, just use awk - the code will be clearer and MUCH easier to enhance in future if necessary.

Resources