In a very large file I need to find the position (line number) of a string, then extract the 2 lines above and below that string.
To do this right now - I launch vi, find the string, note it's line number, exit vi, then use sed to extract the lines surrounding that string.
Is there a way to streamline this process... ideally without having to run vi at all.
Maybe using grep like this:
grep -n -2 your_searched_for_string your_large_text_file
Will give you almost what you expect
-n : tells grep to print the line number
-2 : print 2 additional lines (and the wanted string, of course)
You can do
grep -C 2 yourSearch yourFile
To send it in a file, do
grep -C 2 yourSearch yourFile > result.txt
Use grep -n string file to find the line number without opening the file.
you can use cat -n to display the line numbers and then use awk to get the line number after a grep in order to extract line number:
cat -n FILE | grep WORD | awk '{print $1;}'
although grep already does what you mention if you give -C 2 (above/below 2 lines):
grep -C 2 WORD FILE
You can do it with grep -A and -B options, like this:
grep -B 2 -A 2 "searchstring" | sed 3d
grep will find the line and show two lines of context before and after, later remove the third one with sed.
If you want to automate this, simple you can do a Shell Script. You may try the following:
#!/bin/bash
VAL="your_search_keyword"
NUM1=`grep -n "$VAL" file.txt | cut -f1 -d ':'`
echo $NUM1 #show the line number of the matched keyword
MYNUMUP=$["NUM1"-1] #get above keyword
MYNUMDOWN=$["NUM1"+1] #get below keyword
sed -n "$MYNUMUP"p file.txt #display above keyword
sed -n "$MYNUMDOWN"p file.txt #display below keyword
The plus point of the script is you can change the keyword in VAL variable as you like and execute to get the needed output.
Related
I have looked quite a bit for answers but I am not finding any suggestions that have worked so far.
on command line, this works:
$ myvar=$( cat -n /usr/share/dict/cracklib-small | grep $myrand | sed -e "s/$myrand//" )
$ echo $myvar
$ commonness
however, inside a bash script the same exact lines just echoes out a blank line
notes - $myrand is a number, like 10340 generated with $RANDOM
cat prints out a dictionary with line numbers
grep grabs the line with $myrand in it ; e.g. 10340 commonness
sed is intended to remove the $myrand part of the line and replace it with nothing. here is my sample script
#!/bin/bash
# prints out a random word
myrand=$RANDOM
export myrand
myword=$( cat -n /path/to/dict/cracklib-small | grep myrand | sed -e "s/$myrand//g" <<<"$myword" )
echo $myword
Your command line code is running:
grep $myrand
Your script is running:
grep myrand
These are not the same thing; the latter is looking for a word that contains "myrand" within it, not a random number.
By the way -- I'd suggest a different way to get a random line. If you have GNU coreutils, the shuf tool is built-to-purpose:
myword=$(shuf -n 1 /path/to/dict/cracklib-small)
#!/bin/bash
# prints out a random word
myrand=$RANDOM
export myrand
myword=$( cat -n /path/to/dict/cracklib-small | grep myrand | sed -e "s/$myrand//g" <<<"$myword" )
echo $myword
where is the $ sign in grep myrand ?
you must put in some work before posting it here.
How do I get the line count of a file from the 2nd line of the file, as the first line is header?
wc -l filename
Is there a way to set some condition into it?
Use the tail command:
tail -n +2 file | wc -l
-n +2 would print the file starting from line 2
You can use awk to count from 2nd line onwards:
awk 'NR>1{c++} END {print c}' file
Or simply use NR variable in the END block:
awk 'END {print NR-1}' file
Alternatively using BASH arithmetic subtract 1 from wc output:
echo $(( $(wc -l < file) -1 ))
Delete first line with GNU sed:
sed '1d' file | wc -l
There is no way to tweak the wc command itself. You should whether process the result of the command, or use another tool.
As suggested in other answers, if you are running Bash, a good way is to put the result of the command into an arithmetic expression like $(( $(command) - 1 )).
In case if you are searching for a portable solution, here is a Perl version:
perl -e '1 while <>; print $. - 1' < file
The variable $. holds the number of lines read since a file handle was last closed. The while loop reads all the lines from the file.
Alternately, you could just subtract 2.
echo $((`cat FILE | wc -l`-2))
Please try this one. It will be solved your problem
$ tail -n +2 filename | wc -l
I can't tail the last n lines of a file on linux, likewise i can grep
e.g grep "2015-09-29 04:" filename.ext
But how can i combine both such that i display from a certain grep to the end of the file.
You don't use grep or tail any more. You use sed:
sed -n '/^2015-09-29 04:/,$p'
Don't print by default (-n). From the first line starting 2015-09-29 04: to the end of file ($), print the lines.
If you absolutely must use grep and you have GNU grep, then you could consider:
grep -A 999999999 -e '^2015-09-29 04:'
That prints the first billion or so lines after the first line that matches the pattern (and the counter resets if the pattern appears during that trailing material). Of course, if your file is 2 billion lines long and the pattern occurs after a million lines (and never again), then you'll be missing a lot of data.
#!/bin/bash
line_no=`grep -n -m1 "$1" $2 | cut -d: -f1`
echo "Starting at line: " $line_no
tail -n +$line_no $2
Usage: script 'text to hunt for' filename
e.g.
./grep.sh 'Sep 29 13:14' /var/log/syslog
-n issue line numbers
-m1 stop reading after 1 hit
cut out the line number
tail -n +K output lines starting with the Kth
How can I apply the following command to only a part of a text file? For example from the beginning to the line 5000.
grep "^ A : 11 B : 10" filename | wc -l
I cannot use head and then apply the above command since the text file is huge.
You could try using the sed command, which I believe does better for large files, from this question and pipe to grep.
sed -n 1,5000p file | grep ...
You can try combination of -n (prefixing each line of output with line number) and -m (limiting number of matching lines). Something like this:
grep -n -m 5000 pattern file.txt | grep -B 5000 "^5000:" | wc -l
First grep search for pattern, add line numbers and limit output to first 5000 matching lines (worst case scenario: all lines from range match pattern). Second grep match line number 5000, and print all lines before this line.
I don't know if it is more efficient solution
How do I use grep to perform a search which, when a match is found, will print the file name as well as the first n characters in that file? Note that n is a parameter that can be specified and it is irrelevant whether the first n characters actually contains the matching string.
grep -l pattern *.txt |
while read line; do
echo -n "$line: ";
head -c $n "$line";
echo;
done
Change -c to -n if you want to see the first n lines instead of bytes.
You need to pipe the output of grep to sed to accomplish what you want. Here is an example:
grep mypattern *.txt | sed 's/^\([^:]*:.......\).*/\1/'
The number of dots is the number of characters you want to print. Many versions of sed often provide an option, like -r (GNU/Linux) and -E (FreeBSD), that allows you to use modern-style regular expressions. This makes it possible to specify numerically the number of characters you want to print.
N=7
grep mypattern *.txt /dev/null | sed -r "s/^([^:]*:.{$N}).*/\1/"
Note that this solution is a lot more efficient that others propsoed, which invoke multiple processes.
There are few tools that print 'n characters' rather than 'n lines'. Are you sure you really want characters and not lines? The whole thing can perhaps be best done in Perl. As specified (using grep), we can do:
pattern="$1"
shift
n="$2"
shift
grep -l "$pattern" "$#" |
while read file
do
echo "$file:" $(dd if="$file" count=${n}c)
done
The quotes around $file preserve multiple spaces in file names correctly. We can debate the command line usage, currently (assuming the command name is 'ngrep'):
ngrep pattern n [file ...]
I note that #litb used 'head -c $n'; that's neater than the dd command I used. There might be some systems without head (but they'd pretty archaic). I note that the POSIX version of head only supports -n and the number of lines; the -c option is probably a GNU extension.
Two thoughts here:
1) If efficiency was not a concern (like that would ever happen), you could check $status [csh] after running grep on each file. E.g.: (For N characters = 25.)
foreach FILE ( file1 file2 ... fileN )
grep targetToMatch ${FILE} > /dev/null
if ( $status == 0 ) then
echo -n "${FILE}: "
head -c25 ${FILE}
endif
end
2) GNU [FSF] head contains a --verbose [-v] switch. It also offers --null, to accomodate filenames with spaces. And there's '--', to handle filenames like "-c". So you could do:
grep --null -l targetToMatch -- file1 file2 ... fileN |
xargs --null head -v -c25 --