removing first n and last n lines from multiple text files - linux

I have been stuck for some time now
I have two text files, from which I would like to remove the first two and the last three lines.
So far I have
$tail -n +3 text_1.txt text_2.txt | head -n -3
When I enter this into console, I see that text_2.txt indeed comes out with proper format, but text_1.txt still has that last three lines that need to be removed. I presume that head command is not being applied to text_1.txt.
How can I solve this problem?

for i in text_1.txt text_2.txt; do tail -n +3 "$i" | head -n -3; done

Related

How to stop creating files at a specific number?

I want to create 900 files with 3 letter names.
Example: xyz.txt, where x, y, and z are different letters from a to z.
I'm trying to do something like this:
for ((i=1; i < 900; i++))
do touch {a..z}{a..z}{a..z}$i.txt
done
I expect it to stop at 900.
But in the end I'm creating over 10K of files. Can someone please help me accomplish this?
To avoid any names with repeating characters before the period, as alluded to in comments, you could do this:
printf '%s.txt\n' {a..z}{a..z}{a..z} | grep -vE '(.).?\1.?\.' \
| head -n 900 | xargs touch
The printf statement prints the list from aaa.txt, aab.txt to zzz.txt.
grep -vE '(.).?\1.?\.' filters any names where the three characters before the period are not unique: aaa.txt, aab.txt, aba.txt and baa.txt are all filtered.
head -n 900 gets the first 900 names from the list
xargs touch calls touch as few times as possible but will make sure the command line never is too long.
If you want to randomize the file names (but still adhere to these criteria), you can shuffle them before selecting 900:
printf '%s.txt\n' {a..z}{a..z}{a..z} | grep -vE '(.).?\1.?\.' \
| shuf | head -n 900 | xargs touch
You can do this with 2 lines:
all=( {a..z}{a..z}{a..z}.txt )
touch "${all[#]:0:900}"
(assuming your OS allows a commandline with 900 arguments). If y ouwant something other than the first 900 such files, you'll need to do something more complicated.
The filenames remind the split default output naming. So you could split an empty file to 900 files.
split /dev/null -n900 -a3 --additional-suffix=".txt" ""
The following generates 910 names, and discards the last 10 before passing them to touch.
printf '%s.txt\n' {a..m}{p..v}{0..9} | head -n 900 | xargs touch
You might be able to find a combination which generates exactly 900 alternatives so you can just touch <pattern> and have the shell expand it to the required number of unique names.

How can I get my bash script to remove the first n and last n lines from a variable?`

I'm making a script to preform "dig ns google.com" and cut off the all of the result except for the answers section.
So far I have:
#!/bin/bash
echo -n "Please enter the domain: "
read d
echo "You entered: $d"
dr="$(dig ns $d)"
sr="$(sed -i 1,10d $dr)"
tr="$(head -n -6 $sr)"
echo "$tr"
Theoretically, this should work. The sed and head commands work individually outside of the script to cut off the first 10 and last 6 respectively, but when I put them inside my script sed comes back with an error and it looks like it's trying to read the variable as part of the command rather than the input. The error is:
sed: invalid option -- '>'
So far I haven't been able to find a way for it to read the variable as input. I've tried surrounding it in "" and '' but that doesn't work. I'm new to this whole bash scripting thing obviously, any help would be great!
you're assigning the lines to variables, instead pipe them for example
seq 25 | tail -n +11 | head -n -6
will remove the first 10 and last 6 lines and print from 11 to 19.
in your case replace seq 25 with your script
dig ns "$d" | tail -n +11 | head -n -6
no need for echo either.

Related to head and tail command in Unix

I know what output head -n and tail -n will provide.
Is there any command like head +n (head +2 filename) or tail +n (tail +2 filename)?
If yes, can anyone shed some light on this?
The Single Unix Specification Version 2 (1997) states the following for tail:
In the non-obsolescent form, if neither -c nor -n is specified, -n 10 is assumed.
In the obsolescent version, an argument beginning with a "-" or "+" can be used as a single option. The argument ±number with the letter c specified as a suffix is equivalent to -c ±number; ±number with the b suffix is equivalent to -c ±number*512; ±number with the letter l specified as a suffix, or with none of b, c nor l as a suffix, is equivalent to -n ±number. If number is not specified in these forms, 10 will be used. The letter f specified as a suffix is equivalent to specifying the -f option. If the [number]c[f] form is used and neither number nor the f suffix is specified, it will be interpreted as the -c 10 option.
In other words the following commands in each group are equivalent:
tail -2 file
tail -n 2 file
tail +2 file
tail -n +2
tail -2c file
tail -c 2 file
tail +3lf file
tail -f -n +3 file
Note that unless a "+" is used, the number given means "output the last N lines". If "+" is used, it means "output the lines starting from line N". For example, in a file with 40 lines, tail +2 (or equivalently tail -n +2) would output lines 2..40, whereas using -2 or simply 2 would output lines 39..40.
The next version of the Single Unix Specification of 2001 removed the obsolescent form completely, so there are no "options" starting with a "+" character.
tail supports both positive and negative offsets, but head does not.
Start output at the 10th line from the end of the file:
tail -10 filename
Start out at the 10th line from the begining of the file:
tail +10 filename
I think piping is what you're looking for: https://en.wikipedia.org/wiki/Pipeline_(Unix)
To use the first example you gave:
head +2 filename | head +n
I believe is what you want, though note that I haven't tested it

Tail inverse / printing everything except the last n lines?

Is there a (POSIX command line) way to print all of a file EXCEPT the last n lines? Use case being, I will have multiple files of unknown size, all of which contain a boilerplate footer of a known size, which I want to remove. I was wondering if there is already a utility that does this before writing it myself.
Most versions of head(1) - GNU derived, in particular, but not BSD derived - have a feature to do this. It will show the top of the file except the end if you use a negative number for the number of lines to print.
Like so:
head -n -10 textfile
Probably less efficient than the "wc" + "do the math" + "tail" method, but easier to look at:
tail -r file.txt | tail +NUM | tail -r
Where NUM is one more than the number of ending lines you want to remove, e.g. +11 will print all but the last 10 lines. This works on BSD which does not support the head -n -NUM syntax.
The head utility is your friend.
From the man page of head:
-n, --lines=[-]K
print the first K lines instead of the first 10;
with the leading `-', print all but the last K lines of each file
There's no standard commands to do that, but you can use awk or sed to fill a buffer of N lines, and print from the head once it's full. E.g. with awk:
awk -v n=5 '{if(NR>n) print a[NR%n]; a[NR%n]=$0}' file
cat <filename> | head -n -10 # Everything except last 10 lines of a file
cat <filename> | tail -n +10 # Everything except 1st 10 lines of a file
If the footer starts with a consistent line that doesn't appear elsewhere, you can use sed:
sed '/FIRST_LINE_OF_FOOTER/q' filename
That prints the first line of the footer; if you want to avoid that:
sed -n '/FIRST_LINE_OF_FOOTER/q;p' filename
This could be more robust than counting lines if the size of the footer changes in the future. (Or it could be less robust if the first line changes.)
Another option, if your system's head command doesn't support head -n -10, is to precompute the number of lines you want to show. The following depends on bash-specific syntax:
lines=$(wc -l < filename) ; (( lines -= 10 )) ; head -$lines filename
Note that the head -NUMBER syntax is supported by some versions of head for backward compatibility; POSIX only permits the head -n NUMBER form. POSIX also only permits the argument to -n to be a positive decimal integer; head -n 0 isn't necessarily a no-op.
A POSIX-compliant solution is:
lines=$(wc -l < filename) ; lines=$(($lines - 10)) ; head -n $lines filename
If you need to deal with ancient pre-POSIX shells, you might consider this:
lines=`wc -l < filename` ; lines=`expr $lines - 10` ; head -n $lines filename
Any of these might do odd things if a file is 10 or fewer lines long.
tac file.txt | tail +[n+1] | tac
This answer is similar to user9645's, but it avoids the tail -r command, which is also not a valid option many systems. See, e.g., https://ubuntuforums.org/showthread.php?t=1346596&s=4246c451162feff4e519ef2f5cb1a45f&p=8444785#post8444785 for an example.
Note that the +1 (in the brackets) was needed on the system I tried it on to test, but it may not be required on your system. So, to remove the last line, I had to put 2 in the brackets. This is probably related to the fact that you need to have the last line ending with regular line feed character(s). This, arguably, makes the last line a blank line. If you don't do that, then the tac command will combine the last two lines, so removing the "last" line (or the first to the tail command) will actually remove the last two lines.
My answer should also be the fastest solution of those listed to date for systems lacking the improved version of head. So, I think it is both the most robust and the fastest of all the answers listed.
head -n $((`(wc -l < Windows_Terminal.json)`)) Windows_Terminal.json
This will work on Linux and on MacOs, keep in mind Mac does not support a negative value. so This is quite handy.
n.b : replace Windows_Terminal.json with your file name
It is simple. You have to add + to the number of lines that you wanted to avoid.
This example gives to you all the lines except the first 9
tail -n +10 inputfile
(yes, not the first 10...because it counts different...if you want 10, just type
tail -n 11 inputfile)

Count the number of occurrences in a string. Linux

Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5

Resources