Number lines and hide the empty ones - linux

I am trying to number the lines of a txt file and hide the empty ones . I use this code :
cat -n file.txt | grep . file.txt
But it doesnt work . It ignores the cat command . I want to display all the non-empty lines and number them ( the txt file is not a static one , like a list that a user can type in ).
edit : Given the great solutions below , i would also add that grep . file.txt | cat -n also worked .

I assume you want to number the lines that remain after the empty lines are removed.
Solution #1
Use sed '/^$/d' to delete the empty lines then pipe its output to cat -n to number them:
sed '/^$/d' file.txt | cat -n
The sed program contains only one command: d (delete the line). The sed commands can be prefixed by zero, one or two addresses that tell what lines the command applies to.
In this case there is only one address /^$/. It is a regex (enclosed in /) that selects the empty lines; the lines where start of the line (^) is followed by the end of the line ($).
Solution #2
You can also use grep -v '^$' to filter out the empty lines:
grep -v '^$' file.txt | cat -n
Again, ^$ is a regular expression that matches the empty lines. -v reverses the condition and tells grep to display the lines that do not match the regex.
The commands above do not modify the file. They read the content of file.txt, process it and display the result on screen.
Update
As #robc suggests in a comment, nl is even better than cat -n to number the lines. Thank you #robc, I didn't know about nl until now (I didn't know about cat -n either). It is never too late to learn new things.

This could be easily done with awk. This will print line with line numbers and ignore empty lines.
awk 'NF{print FNR,$0}' file.txt
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
NF{ ##Checking condition if NF(number of fields) is NOT NULL in current line then do following.
print FNR,$0 ##Printing current line number by mentioning FNR and then current line value.
}
' file.txt ##Mentioning Input_file name which we are passing to awk program here.

Related

Extract a value from a file

I have a file with many lines, one is:
COMPOSER_HOME=/home/glen/.composer
I want to extract the string /home/glen/.composer from this file in my shell script. How can I?
I can get the whole line with grep but not sure how to remove the first part.
Here:
grep 'COMPOSER_HOME=' file| cut -d= -f2
cut cut's by delimiter = and the 2nd portion would be whatever is After the = e.g.: /home/glen/.composer , with -f1 you would get COMPOSER_HOME
Since you tagged linux, you have GNU grep which includes PCRE
grep -oP 'COMPOSER_HOME=\K.+' file
The \K means match what comes before, then throw it out and operate on the rest of the line.
You can also use awk
awk -F "=" '$1 == "COMPOSER_HOME" {print $2}' file
Maybe this is enough
sed -nE 's/COMPOSER_HOME=(.*)/\1/p' your_file
It does not print any line unless you explicitly request it (-n), it matches the line starting with COMPOSER_HOME= and captures what follows (.*) (using () instead of \(\), thanks to -E), and puts in the replacement only what is captured. Then requests the printing of the line with the p flag of the substitution command.

How to add characters in word and replace it using sed command in linux

I have one requirement.
I have one text file named as a.txt, which is having list of words -
GOOGLE
FACEBBOK
Now I have one another file named as b.txt , which is having content as
Company name is google.
Company name is facebook.
Like this n of lines are there with different different words.
Then I am writing script file -
FILENAME="a.txt"
SCHEMA=$(cat $FILENAME)
for L in $SCHEMA
do
echo "${L,,}"
sed -i -E "s/.+/\L&_/" b.txt
done
So after running script the output file of b.txt file I am expecting is
Company name is google_
Company name is facebook_
But the output after running that script I am getting is -
Company name is google.__
Company name is facebook.__
And this output will be saved in b.txt file as I mentioned in sed command
Note - In a.txt I am having the list of Words which I want to replace and in b.txt file I am having paragraphs of line in which I am having words like google. , facebook. and so on.
So that's why I am not able to give direct sed command for replacement.
I hope that you understand my requirement.
Thanks in advance!
You can use the following GNU sed solution:
FILENAME="a.txt"
while IFS= read -r L; do
sed -i "s/\($L\)\./\1_/gI" b.txt
done < $FILENAME
Or, the same without a loop as a single line (as used in anubhava's answer):
sed -i -f <(printf 's/\\(%s\\)\\./\\1_/gI\n' $(<"$FILENAME")) b.txt
With the script, you
while IFS= read -r L; do - read the file line by line, each line being assigned to L
sed -i "s/\($L\)\./\1_/gI" b.txt - replaces all occurrences of L (captured into Group 1 with the help of capturing \(...\) parentheses) followed with . (in a case insensitive way due to I flag) in b.txt with the same value as captured in Group 1 and _ appended to it.
-f allows passing a list of commands to sed
printf 's/\\(%s\\)\\./\\1_/gI\n' $(<"$FILENAME") creates a list of sed commands, in this case, it looks like
s/\(GOOGLE\)\./\1_/gI
s/\(FACEBOOK\)\./\1_/gI
Here is how you can do it in a single shell command without any loop using gnu-sed with printf in a process substitution:
sed -i -E -f <(printf 's/\\b(%s)\\./\\1_/I\n' $(<a.txt)) b.txt
cat b.txt
Company name is google_
Company name is facebook_
This would be far more efficient than running sed or awk in a loop esp if input files are big in size.
printf command is creating a sed command script that looks like this:
s/\b(GOOGLE)\./\1_/I
s/\b(FACEBOOK)\./\1_/I
sed -f runs that dynamically generated script
With a single awk reading 2 Input_files could you please try following.
awk '
FNR==NR{
a[tolower($0)]
next
}
($(NF-1) in a){
sub(/\.$/,"")
print $0"_"
}
' a.txt FS="[ .]" b.txt
Explanation: Adding detailed explanation for above solution.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when a.txt is being read.
a[tolower($0)] ##Creating array a with index of current line in lower case from a.txt here.
next ##next will skip all further statements from here.
}
($(NF-1) in a){ ##Checking condition if 2nd last field is present in array a then do following.
sub(/\.$/,"") ##Substituting last DOT with NULL here.
print $0"_" ##Printing current line with _ here.
}
' a.txt FS="[ .]" b.txt ##Mentioning a.txt and setting field separator as space and . for b.txt here.
2nd solution: Adding 1 more solution with awk here.
awk '
FNR==NR{
a[tolower($0)]
next
}
{
sub(/\.$/,"")
}
($NF in a){
print $0"_"
}
' a.txt b.txt
This might work for you (GNU sed):
sed 's#.*#s/(&)./\\1_/Ig#' a.txt | sed -i -Ef - b.txt
N.B. The match is case insensitive because of the I flag on the substitution command, however the replacement is from the original file i.e. if the original string is google the match is case insensitive to GOOGLE and replaced by google_.

Capturing string between 2 specific letters/words using shell scripting

I am trying to capture the string between 2 specific letters/words using sed/awk. This is what I am trying to do:
The input is a file test.log containing
Owner: CN=abc.samplecerrt.com,o=IN,DC=com
Owner: CN=abc1.samplecerrt.com,o=IN,DC=com
I want to extract only "CN=abc.samplecerrt.com"
I tried
sed 's/.*CN=\(.*\),.*/\1/p' test.log >> result.log
But this returns "abc.samplecerrt.com,o=IN,DC=com"
How do I go about this?
test file:
$ cat logs.txt
CN=abc.samplecerrt.com,o=IN,DC=com Owner: CN=abc1.samplecerrt.com,o=IN,DC=com
command and output:
$ grep -oP 'CN=(?:(?!CN=).)*?.com' logs.txt
CN=abc.samplecerrt.com
CN=abc1.samplecerrt.com
This might work for you (GNU sed):
sed -n 's/.*\(CN=[^,]*\).*/\1/p' file
Or:
sed 's/.*\(CN=[^,]*\).*/\1/p;d' file
The first turns off implicit printing -n so as to act like grep.
Matches and captures the string CN= followed by zero or more non-comma characters and prints the captured group \1 if a match is made.
The second solution is much the same except it deletes all lines and only prints the captured group as above.
With awk you can get the field where is the string you need. For it, you can set FS=:|, Now if you run
awk -v FS=":|," '{print $2}' file
CN=abc.samplecerrt.com
CN=abc1.samplecerrt.com
you get the field. But you only want one, so
awk -v FS=":|," '$2 !~ /abc1/ {print $2}' file
CN=abc.samplecerrt.com

Use '\n' as a field separator in awk command

I have multiple lines, and i need to select one of them, i have found the required lines using grep,
but now i want only the first line from the result.
How can i do it using, grep, awk, sed etc..
This is first line.
This is second line.
This is seventh line.
Using grep i got the o/p
grep "This is s" file.txt.
This is second line.
This is seventh line.
now i need the first line from this.
How can i use '\n' as a field separator.
Print the first line that matches This is s and quit with awk:
$ awk '/This is s/{print $0; exit}'
This is second line.
However GNU grep has the -m option which stops are the given number of matches:
$ grep -Fm 1 'This is s' file
This is second line.
Note: the -F is for fixed string matching instead of regular expressions.
And for completeness with sed you could do:
$ sed '/This is s/!d;q' file
This is second line.
However the example seems slightly strange as you could just do grep 'second' file.

Print a file, skipping the first X lines, in Bash [duplicate]

This question already has answers here:
How can I remove the first line of a text file using bash/sed script?
(19 answers)
Closed 3 years ago.
I have a very long file which I want to print, skipping the first 1,000,000 lines, for example.
I looked into the cat man page, but I did not see any option to do this. I am looking for a command to do this or a simple Bash program.
You'll need tail. Some examples:
$ tail great-big-file.log
< Last 10 lines of great-big-file.log >
If you really need to SKIP a particular number of "first" lines, use
$ tail -n +<N+1> <filename>
< filename, excluding first N lines. >
That is, if you want to skip N lines, you start printing line N+1. Example:
$ tail -n +11 /tmp/myfile
< /tmp/myfile, starting at line 11, or skipping the first 10 lines. >
If you want to just see the last so many lines, omit the "+":
$ tail -n <N> <filename>
< last N lines of file. >
Easiest way I found to remove the first ten lines of a file:
$ sed 1,10d file.txt
In the general case where X is the number of initial lines to delete, credit to commenters and editors for this:
$ sed 1,Xd file.txt
If you have GNU tail available on your system, you can do the following:
tail -n +1000001 huge-file.log
It's the + character that does what you want. To quote from the man page:
If the first character of K (the number of bytes or lines) is a
`+', print beginning with the Kth item from the start of each file.
Thus, as noted in the comment, putting +1000001 starts printing with the first item after the first 1,000,000 lines.
If you want to skip first two line:
tail -n +3 <filename>
If you want to skip first x line:
tail -n +$((x+1)) <filename>
A less verbose version with AWK:
awk 'NR > 1e6' myfile.txt
But I would recommend using integer numbers.
Use the sed delete command with a range address. For example:
sed 1,100d file.txt # Print file.txt omitting lines 1-100.
Alternatively, if you want to only print a known range, use the print command with the -n flag:
sed -n 201,300p file.txt # Print lines 201-300 from file.txt
This solution should work reliably on all Unix systems, regardless of the presence of GNU utilities.
Use:
sed -n '1d;p'
This command will delete the first line and print the rest.
If you want to see the first 10 lines you can use sed as below:
sed -n '1,10 p' myFile.txt
Or if you want to see lines from 20 to 30 you can use:
sed -n '20,30 p' myFile.txt
Just to propose a sed alternative. :) To skip first one million lines, try |sed '1,1000000d'.
Example:
$ perl -wle 'print for (1..1_000_005)'|sed '1,1000000d'
1000001
1000002
1000003
1000004
1000005
You can do this using the head and tail commands:
head -n <num> | tail -n <lines to print>
where num is 1e6 + the number of lines you want to print.
This shell script works fine for me:
#!/bin/bash
awk -v initial_line=$1 -v end_line=$2 '{
if (NR >= initial_line && NR <= end_line)
print $0
}' $3
Used with this sample file (file.txt):
one
two
three
four
five
six
The command (it will extract from second to fourth line in the file):
edu#debian5:~$./script.sh 2 4 file.txt
Output of this command:
two
three
four
Of course, you can improve it, for example by testing that all argument values are the expected :-)
cat < File > | awk '{if(NR > 6) print $0}'
I needed to do the same and found this thread.
I tried "tail -n +, but it just printed everything.
The more +lines worked nicely on the prompt, but it turned out it behaved totally different when run in headless mode (cronjob).
I finally wrote this myself:
skip=5
FILE="/tmp/filetoprint"
tail -n$((`cat "${FILE}" | wc -l` - skip)) "${FILE}"

Resources