I am concatenating a large number of files into a single one with the following command:
$ cat num_*.dat > dataset.dat
However, due to the structure of the files, I'd like to omit concatenating the first two and last two lines of each file. Those lines contain file information which is not important for my necesities.
I know the existence of head and tail, but I don't now how to combine them in a UNIX instruction to solve my issue.
The head command has some odd parameter usage.
You can use the following to list all of the lines except the last two.
$ cat num_*.dat | head -n-2 > dataset.dat
Next, take that and run the following tail command on it
$ tail dataset.dat -n+3 >> dataset.dat
I believe the following will work as one command.
$ cat num_*.dat | head -n-2 | tail -n+3 > dataset.dat
I tested on a file that had lines like the following:
Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line
7
This one will get you started:
cat test.txt | head -n-2 | tail -n+3
From the file above it prints :
Line 3 Line 4 Line 5
The challenge is that when you use cat filename*.dat or whatever is that the command cats all of the files then runs the command one time so it becomes one large file with only removing the first two lines of the first catted file and the two lines of that last catted file.
Final Answer - Need to Write a Bash Script
I wrote a bash script that will do this for you.
This one will iterate through each file in your directory and run the command.
Notice that it appends (>>) to the dataset.dat file.
for file in num_*.dat; do
if [ -f "$file" ]; then
cat $file | head -n-2 | tail -n+3 >> dataset.dat
echo "$file"
fi
done
I had two files that looked like the following:
line 1 line 2 line 3 line 4 line 5 line 6 line
7 2 line 1 2 line 2 2 line 3 2 line 4 2 line 5
2 line 6 2 line 7
The final output was:
line 3 line 4 line 5 2 line 3 2 line 4 2 line
5
for i in num_*.dat; do # loop through all files concerned
cat $i | tail -n +3 | head -n -2 >> dataset.dat
done
Related
I've a simple text file, named samples.log. In this file I've several lines. Suppose I have a total of 10 lines. My purpose is to replace the first 5 lines of the file with the last file lines of the same file. For example:
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
Become:
line 6
line 7
line 8
line 9
line 10
In other words, I simply want to delete the first 5 lines of the file and then I want to shift up the last 5. I'm working on Linux. What is the most simple way to do this? Is there a command?
I'm working on a C program, but I think that is better to execute the linux command inside the program, instead of doing this operation in C, that I think would be quite difficult.
Simply
tail -n +6 samples.log
will do the job. tail -n +NUM file will print the file starting with line NUM
You can use this command:
tail -n $(($(cat samples.log | wc -l) - 5)) samples.log
You calculate the total amount of lines:
cat samples.log | wc -l
From that, you subtract 5:
$((x - 5))
And you use that last number of lines:
tail -n x samples.log
I know how to take e.g the first 2 lines from a .txt data and appending it to the end of a .txt data. But how should I add the last 2 lines of a .txt data to the 1st line of a .txt data
I've tried :
tail -n 2 test1.txt >> head test1.txt # takes last 2 lines of text and adding
it to the head
Looks awfully wrong but I can't find the answer anywhere, doing it with tail and head.
tail n 2 test1.txt >> head test1.txt
cat test1.txt
Someone please correct my code so I get my expected result.
Just run the two commands one after the other -- the stdout resulting from doing so will be exactly the same as what you'd get by concatenating their output together, without needing to do an explicit/extra concatenation step:
tail -n 2 test1.txt
head -n 1 test1.txt
If you want to redirect their output together, put them in a brace group:
{
tail -n 2 test1.txt
head -n 1 test1.txt
} >out.txt
What about:
$ cat file1.txt
file 1 line 1
file 1 line 2
file 1 line 3
file 1 line 4
$ cat file2.txt
file 2 line 1
file 2 line 2
file 2 line 3
file 2 line 4
$ tail -n 2 file1.txt > output.txt
$ head -n 1 file2.txt >> output.txt
$ cat output.txt
file 1 line 3
file 1 line 4
file 2 line 1
I have the following lines in file1:
line 1text
line 2text
line 3text
line 4text
line 5text
line 6text
line 7text
With the command cat file1 | sort -R | head -4 I get the following in file2:
line 5text
line 1text
line 7text
line 2text
I would like to order the lines (not numerically, just the same order as file1) into the following file3:
line 1text
line 2text
line 5text
line 7text
The actual data doesn't have digits. Any easy way to do this? I was thinking of doing a grep and finding the first instance in a loop. But, I'm sure you experienced guys know an easier solution. Your positive input is highly appreciated.
You can decorate with line numbers, select four random lines lines, sort by line number and remove the line numbers:
$ nl -b a file1 | shuf -n 4 | sort -n -k 1,1 | cut -f 2-
line 2text
line 5text
line 6text
line 7text
The -b a option to nl makes sure that also empty lines are numbered.
Notice that this loads all of file1 into memory, as pointed out by ghoti. To avoid that (and as a generally smarter solution), we can use a different feature of (GNU) shuf: its -i option takes a number range and treats each number as a line. To get four random line numbers from an input file file1, we can use
shuf -n 4 -i 1-$(wc -l < file1)
Now, we have to print exactly these lines. Sed can do that; we just turn the output of the previous command into a sed script and run sed with sed -n -f -. All together:
shuf -n 4 -i 1-$(wc -l < file1) | sort -n | sed 's/$/p/;$s/p/{&;q}/' |
sed -n -f - file1
sort -n sorts the line numbers numerically. This isn't strictly needed, but if we know that the highest line number comes last, we can quit sed afterwards instead of reading the rest of the file for nothing.
sed 's/$/p/;$s/p/{&;q}/ appends p to each line. For the last line, we append {p;q} to stop processing the file.
If the output from sort looks like
27
774
670
541
then the sed command turns it into
27p
774p
670p
541{p;q}
sed -n -f - file1 processes file1, using the output of above sed command as the instructions for sed. -n suppresses output for the lines we don't want.
The command can be parametrized and put into a shell function, taking the file name and the number of lines to print as arguments:
randlines () {
fname=$1
nlines=$2
shuf -n "$nlines" -i 1-$(wc -l < "$fname") | sort -n |
sed 's/$/p/;$s/p/{&;q}/' | sed -n -f - "$fname"
}
to be used like
randlines file1 4
cat can add line numbers:
$ cat -n file
1 line one
2 line two
3 line three
4 line four
5 line five
6 line six
7 line seven
8 line eight
9 line nine
So you can use that to decorate, sort, undecorate:
$ cat -n file | sort -R | head -4 | sort -n
You can also use awk to decorate with a random number and line index (if your sort lacks -R like on OS X):
$ awk '{print rand() "\t" FNR "\t" $0}' file | sort -n | head -4
0.152208 4 line four
0.173531 8 line eight
0.193475 6 line six
0.237788 1 line one
Then sort with the line numbers and remove the decoration (one or two columns depending if you use cat or awk to decorate):
$ awk '{print rand() "\t" FNR "\t" $0}' file | sort -n | head -4 | cut -f2- | sort -n | cut -f2-
line one
line four
line six
line eight
another solution could be to sort whole file
sort file1 -o file2
to pick random lines on file2
shuf -n 4 file2 -o file3
I want to add a new line to the top of a data file with sed, and write something to that line.
I tried this as suggested in How to add a blank line before the first line in a text file with awk :
sed '1i\
\' ./filename.txt
but it printed a backslash at the beginning of the first line of the file instead of creating a new line. The terminal also throws an error if I try to put it all on the same line ("1i\": extra characters after \ at the end of i command).
Input :
1 2 3 4
1 2 3 4
1 2 3 4
Expected output
14
1 2 3 4
1 2 3 4
1 2 3 4
$ sed '1i\14' file
14
1 2 3 4
1 2 3 4
1 2 3 4
but just use awk for clarity, simplicity, extensibility, robustness, portability, and every other desirable attribute of software:
$ awk 'NR==1{print "14"} {print}' file
14
1 2 3 4
1 2 3 4
1 2 3 4
Basially you are concatenating two files. A file containing one line and the original file. By it's name this is a task for cat:
cat - file <<< 'new line'
# or
echo 'new line' | cat - file
while - stands for stdin.
You can also use cat together with command substitution if your shell supports this:
cat <(echo 'new line') file
Btw, with sed it should be simply:
sed '1i\new line' file
I need to move the last 4 lines of a text file and move them to the second row in the text file.
I'm assuming that tail and sed are used but, I haven't much luck so far.
Here is a head and tail solution. Let us start with the same sample file as Glenn Jackman:
$ seq 10 >file
Apply these commands:
$ head -n1 file ; tail -n4 file; tail -n+2 file | head -n-4
1
7
8
9
10
2
3
4
5
6
Explanation:
head -n1 file
Print first line
tail -n4 file
Print last four lines
tail -n+2 file | head -n-4
Print the lines starting with line 2 and ending before the fourth-to-last line.
If I'm assuming correctly, ed can handle your task:
seq 10 > file
ed file <<'COMMANDS'
$-3,$m1
w
q
COMMANDS
cat file
1
7
8
9
10
2
3
4
5
6
lines 7,8,9,10 have been moved to the 2nd line
$-3,$m1 means, for the range of lines from "$-3" (3 lines before the last line) to "$" (the last line, move them ("m") below the first line ("1")
Note that the heredoc has been quoted so the shell does not try to interpret the strings $- and $m1 as variables
If you don't want to actually modify the file, but instead print to stdout:
ed -s file <<'COMMANDS'
$-3,$m1
%p
Q
COMMANDS
Here is an awk solution:
seq 10 > file
awk '{a[NR]=$0} END {for (i=1;i<=NR-4;i++) if (i==2) {for (j=NR-3;j<=NR;j++) print a[j];print a[i]} else print a[i]}' file
1
7
8
9
10
2
3
4
5
6