Can you use 'less' or 'more' to output one page worth of text? - linux

So in Linux, less is used to read files page by page for better readability. I was wondering if you can do something like less file.txt > output.txt to get one page worth of file.txt and output/write it to `output.txt.
Apparently, this does not work, output.txt is exactly the same as the original file, I'm wondering why this is the case, and if there are other work-arounds. Thank you!

You can use the split command.
split -l 100 -d -a 3 input output
This will split the input file every 100 lines (-l 100), will use numbers as suffixes (-d) and will use 3 numbers as suffix (-a 3) in the output file. Something like this output000, output001, output002

You can use head to get a specific number of lines, and tput lines to see how many lines there are on your current terminal.
Here's a script that fetches a pageful, or the standard 25 lines if no terminal is available:
#!/bin/bash
lines=$(tput lines) || lines=25
head -n "$lines" file.txt > output.txt

we use head and tail to get n lines of top or bottom part of a file
cat /var/log/messages|tail -n20
head -n10 src/main.h

Related

Comparing list of directories to list in a file and writing another file for comparison [duplicate]

Ok, I have two related lists on my linux box in text files:
/tmp/oldList
/tmp/newList
I need to compare these lists to see what lines got added and what lines got removed. I then need to loop over these lines and perform actions on them based on whether they were added or removed.
How do I do this in bash?
Use the comm(1) command to compare the two files. They both need to be sorted, which you can do beforehand if they are large, or you can do it inline with bash process substitution.
comm can take a combination of the flags -1, -2 and -3 indicating which file to suppress lines from (unique to file 1, unique to file 2 or common to both).
To get the lines only in the old file:
comm -23 <(sort /tmp/oldList) <(sort /tmp/newList)
To get the lines only in the new file:
comm -13 <(sort /tmp/oldList) <(sort /tmp/newList)
You can feed that into a while read loop to process each line:
while read old ; do
...do stuff with $old
done < <(comm -23 <(sort /tmp/oldList) <(sort /tmp/newList))
and similarly for the new lines.
The diff command will do the comparing for you.
e.g.,
$ diff /tmp/oldList /tmp/newList
See the above man page link for more information. This should take care of your first part of your problem.
Consider using Ruby if your scripts need readability.
To get the lines only in the old file:
ruby -e "puts File.readlines('/tmp/oldList') - File.readlines('/tmp/newList')"
To get the lines only in the new file:
ruby -e "puts File.readlines('/tmp/newList') - File.readlines('/tmp/oldList')"
You can feed that into a while read loop to process each line:
while read old ; do
...do stuff with $old
done < ruby -e "puts File.readlines('/tmp/oldList') - File.readlines('/tmp/newList')"
This is old, but for completeness we should say that if you have a really large set, the fastest solution would be to use diff to generate a script and then source it, like this:
#!/bin/bash
line_added() {
# code to be run for all lines added
# $* is the line
}
line_removed() {
# code to be run for all lines removed
# $* is the line
}
line_same() {
# code to be run for all lines at are the same
# $* is the line
}
cat /tmp/oldList | sort >/tmp/oldList.sorted
cat /tmp/newList | sort >/tmp/newList.sorted
diff >/tmp/diff_script.sh \
--new-line-format="line_added %L" \
--old-line-format="line_removed %L" \
--unchanged-line-format="line_same %L" \
/tmp/oldList.sorted /tmp/newList.sorted
source /tmp/diff_script.sh
Lines changed will appear as deleted and added. If you don't like this, you can use --changed-group-format. Check the diff manual page.
I typically use:
diff /tmp/oldList /tmp/newList | grep -v "Common subdirectories"
The grep -v option inverts the match:
-v, --invert-match
Selected lines are those not matching any of the specified pat-
terns.
So in this case it takes the diff results and omits those that are common.
Have you tried diff
$ diff /tmp/oldList /tmp/newList
$ man diff

text file contains lines of bizarre characters - want to fix

I'm an inexperienced programmer grappling with a new problem in a large text file which contains data I am trying to process. Here's a screen capture of what I'm looking at (using 'less' - I am on a linux server):
https://drive.google.com/file/d/0B4VAqfRxlxGpaW53THBNeGh5N2c/view?usp=sharing
Bioinformaticians will recognize this file as a "fastq" file containing DNA sequence data. The top half of the screenshot contains data in its expected format (which I admit contains some "bizarre" characters, but that is not the issue). However, the bottom half (with many characters shaded in white) is completely messed up. If I were to scroll down the file, it eventually returns to normal text after about 500 lines. I want to fix it because it is breaking downstream operations I am trying to perform (which complain about precisely this position in the file).
Is there a way to grep for and remove the shaded lines? Or can I fix this problem by somehow changing the encoding on the offending lines?
Thanks
If you are lucky, you can use
strings file > file2
Oh well, try it another way.
Determine the linelength of the correct lines (I think the first two lines are different).
head -1 file | wc -c
head -2 file | tail -1 | wc -c
Hmm, wc also counts the line-ending, substract 1 from both lengths.
Than try to read the file 1 line a time. Use a case-statement so you do not have to write a lot of else-if constructions for comparing the length to the expected length. In the code I will accept the lengths 20, 100 and 330
Redirect everything to another file outside the loop (inside will overwrite each line).
cat file | while read -r line; do
case ${#line} in
20|100|330) echo $line ;;
esac
done > file2
A total different approach would be filtering the wrong lines with sed, awk or grep but that would require knowledge what characters you will and won't accept.
Yes, when you are a lucky (wo-)man, all ugly lines will have a character in common like '<' or maybe an '#'. In that case you can use egrep:
egrep -v "<|#" file > file2
BASED ON INSPECTION OF THE SNAP
sed -r 's/<[[:alnum:]]{2}>//g;s/\^.//g;s/ESC\^*C*//g' file
to make the actual changes in the file and make a backup file with a .bak extension do
sed -r -i.bak 's/<[[:alnum:]]{2}>//g;s/\^.//g;s/ESC\^*C*//g' file

tail -f <filename>, print line number as well

Is there a way to modify so that the tail -f lists the line number of the current file as well.
Something similar to grep -n <Strings> *.
Try less
Instead of using tail to follow data and less or nl for numbering, I suggest using a single tool for both:
less -N +F <filename>
This will make less print line numbers and follow the file. From man less:
F
Scroll forward, and keep trying to read when the end of file is reached. Normally this command would be used when already at the end of the file. It is a way to monitor the tail of a file which is growing while it is being viewed. (The behavior is similar to the tail -f command.)
You could do a Ctrl+C to stop following when inside less; to start following again, you could press F again. With this method, you get the additional goodies that less offers like regex-based search, tags, etc.
This command takes into account the number of lines above also
tail -f -n +1 yourfile.txt | nl
You can use less command,
tail -f <filename> | less -N
According to man page of less
-N or --LINE-NUMBERS
Causes a line number to be displayed at the beginning of each
line in the display.

Tail inverse / printing everything except the last n lines?

Is there a (POSIX command line) way to print all of a file EXCEPT the last n lines? Use case being, I will have multiple files of unknown size, all of which contain a boilerplate footer of a known size, which I want to remove. I was wondering if there is already a utility that does this before writing it myself.
Most versions of head(1) - GNU derived, in particular, but not BSD derived - have a feature to do this. It will show the top of the file except the end if you use a negative number for the number of lines to print.
Like so:
head -n -10 textfile
Probably less efficient than the "wc" + "do the math" + "tail" method, but easier to look at:
tail -r file.txt | tail +NUM | tail -r
Where NUM is one more than the number of ending lines you want to remove, e.g. +11 will print all but the last 10 lines. This works on BSD which does not support the head -n -NUM syntax.
The head utility is your friend.
From the man page of head:
-n, --lines=[-]K
print the first K lines instead of the first 10;
with the leading `-', print all but the last K lines of each file
There's no standard commands to do that, but you can use awk or sed to fill a buffer of N lines, and print from the head once it's full. E.g. with awk:
awk -v n=5 '{if(NR>n) print a[NR%n]; a[NR%n]=$0}' file
cat <filename> | head -n -10 # Everything except last 10 lines of a file
cat <filename> | tail -n +10 # Everything except 1st 10 lines of a file
If the footer starts with a consistent line that doesn't appear elsewhere, you can use sed:
sed '/FIRST_LINE_OF_FOOTER/q' filename
That prints the first line of the footer; if you want to avoid that:
sed -n '/FIRST_LINE_OF_FOOTER/q;p' filename
This could be more robust than counting lines if the size of the footer changes in the future. (Or it could be less robust if the first line changes.)
Another option, if your system's head command doesn't support head -n -10, is to precompute the number of lines you want to show. The following depends on bash-specific syntax:
lines=$(wc -l < filename) ; (( lines -= 10 )) ; head -$lines filename
Note that the head -NUMBER syntax is supported by some versions of head for backward compatibility; POSIX only permits the head -n NUMBER form. POSIX also only permits the argument to -n to be a positive decimal integer; head -n 0 isn't necessarily a no-op.
A POSIX-compliant solution is:
lines=$(wc -l < filename) ; lines=$(($lines - 10)) ; head -n $lines filename
If you need to deal with ancient pre-POSIX shells, you might consider this:
lines=`wc -l < filename` ; lines=`expr $lines - 10` ; head -n $lines filename
Any of these might do odd things if a file is 10 or fewer lines long.
tac file.txt | tail +[n+1] | tac
This answer is similar to user9645's, but it avoids the tail -r command, which is also not a valid option many systems. See, e.g., https://ubuntuforums.org/showthread.php?t=1346596&s=4246c451162feff4e519ef2f5cb1a45f&p=8444785#post8444785 for an example.
Note that the +1 (in the brackets) was needed on the system I tried it on to test, but it may not be required on your system. So, to remove the last line, I had to put 2 in the brackets. This is probably related to the fact that you need to have the last line ending with regular line feed character(s). This, arguably, makes the last line a blank line. If you don't do that, then the tac command will combine the last two lines, so removing the "last" line (or the first to the tail command) will actually remove the last two lines.
My answer should also be the fastest solution of those listed to date for systems lacking the improved version of head. So, I think it is both the most robust and the fastest of all the answers listed.
head -n $((`(wc -l < Windows_Terminal.json)`)) Windows_Terminal.json
This will work on Linux and on MacOs, keep in mind Mac does not support a negative value. so This is quite handy.
n.b : replace Windows_Terminal.json with your file name
It is simple. You have to add + to the number of lines that you wanted to avoid.
This example gives to you all the lines except the first 9
tail -n +10 inputfile
(yes, not the first 10...because it counts different...if you want 10, just type
tail -n 11 inputfile)

Splitting a file and its lines under Linux/bash

I have a rather large file (150 million lines of 10 chars). I need to split it in 150 files of 2 million lines, with each output line being alternatively the first 5 characters or the last 5 characters of the source line.
I could do this in Perl rather quickly, but I was wondering if there was an easy solution using bash.
Any ideas?
Homework? :-)
I would think that a simple pipe with sed (to split each line into two) and split (to split things up into multiple files) would be enough.
The man command is your friend.
Added after confirmation that it is not homework:
How about
sed 's/\(.....\)\(.....\)/\1\n\2/' input_file | split -l 2000000 - out-prefix-
?
I think that something like this could work:
out_file=1
out_pairs=0
cat $in_file | while read line; do
if [ $out_pairs -gt 1000000 ]; then
out_file=$(($out_file + 1))
out_pairs=0
fi
echo "${line%?????}" >> out${out_file}
echo "${line#?????}" >> out${out_file}
out_pairs=$(($out_pairs + 1))
done
Not sure if it's simpler or more efficient than using Perl, though.
First five chars of each line variant, assuming that the large file called x.txt, and assuming it's OK to create files in the current directory with names x.txt.* :
split -l 2000000 x.txt x.txt.out && (for splitfile in x.txt.out*; do outfile="${splitfile}.firstfive"; echo "$splitfile -> $outfile"; cut -c 1-5 "$splitfile" > "$outfile"; done)
Why not just use native linux split function?
split -d -l 999999 input_filename
this will output new split files with file names like x00 x01 x02...
for more info see the manual
man split

Resources