How to skip first line of a file and read the remaining lines as input of a C program? - linux

How to write the shell command to skip the first line in file a.csv and redirect the remaining lines as input to myProgram, which is my C program?
I wrote
./myProgram < a.csv | tail -n + 2
But this does not work, it seems like it will skip the first line of the output from myProgram.

Erm...
tail -n +2 a.csv | ./myProgram

If you want to skip the first line, the traditional solution is sed:
sed -e 1d a.csv | ./myProgram

If your shell is Bash, it supports process substitution: a mechanism that lets you treat the output of a command just like a file. So instead of what you wrote, you can use
./myProgram < <(tail -n +2 a.csv)
What your command did instead was to use the complete file a.csv as the input to myProgram, then pipe the output to tail -n + 2 (did you really use a space between + and 2?).

Related

Print lines between line numbers from a line list and save every instance in separate file using GNU Parallel

I have a file, say "Line_File" with a list of line start & end numbers and file ID :
F_a 1 108
F_b 109 1210
F_c 131 1190
I have another file, "Data_File" from where I need to fetch all the lines between the line numbers fetched from the Line_File.
The command in sed:
'sed -n '1,108p' Data_File > F_a.txt
does the job but I need to do this for all the values in columns 2 & 3 of Line_File and save it with the file name mentioned in the column 1 of the Line_File.
If $1, $2 and $3 are the three cols of Line_File then I am looking for a command something like
'sed -n '$2,$3p' Data_File > $1.txt
I can run the same using Bash Loop but that will be very slow for a very large file, say 40GB.
I specifically want to do this because I am trying to use GNU Parallel to make it faster and line number based slicing will make the output non-overlapping. I am trying to execute command like this
cat Data_File | parallel -j24 --pipe --block 1000M --cat LC_ALL=C sed -n '$2,$3p' > $1.txt
But I am no able to actually use the column assignment $1,$2 and $3 properly.
I tried the following command:
awk '{system("sed -n \""$2","$3"p\" Data_File > $1"NR)}' Line_File
But it doesn't work. Any idea where I am going wrong?
P.S If my question is not clear then please point out what else I should be sharing.
You may use xargs with -P (parallel) option:
xargs -P 8 -L 1 bash -c 'sed -n "$2,$3p" Data_File > $1.txt' _ < Line_File
Explanation:
This xargs command takes Line_File as input by using <
-P 8 option allows it to run up to 8 processes in parallel
-L 1 makes xargs process one line at a time
bash -c ... forks bash for each line in input file
_ before < passes _ as $0 and passes remaining 3 column in each input line as $1, $2,$3`
sed -n runs sed command for each line by forming a command line
Or you may use gnu parallel like this:
parallel --colsep '[[:blank:]]' "sed -n '{2},{3}p' Data_File > {1}.txt" :::: Line_File
Check parallel examples from official doc
awk to the rescue!
this scans the data file only once
$ awk 'NR==FNR {k=$1; s[k]=$2; e[k]=$3; next}
{for(k in s) if(FNR>=s[k] && FNR<=e[k]) print > (k".txt")}' lines data
This might work for you (GNU parallel and sed):
parallel --dry-run -a lineFile -C' ' "sed -n '{2},{3}p' dataFile > {1}'
This uses the column separator -C ' ' and sets it to a space, this then sets the first 3 fields of the lineFile to {1},{2} and {3}. The --dry-run option allows you to check the commands parallel generates before running for real. Once the commands look correct remove the --dry-run option.
You are likely not to be CPU constrained. It is more likely your disks will be the limiting factor. To avoid reading DataFile over and over again, you should run as many jobs as possible in parallel. That way caching will help you:
cat Line_file |
parallel -j0 --colsep ' ' sed -n {2},{3}p Data_File \> {1}.txt

how to show the third line of multiple files

I have a simple question. I am trying to check the 3rd line of multiple files in a folder, so I used this:
head -n 3 MiseqData/result2012/12* | tail -n 1
but this doesn't work obviously, because it only shows the third line of the last file. But I actually want to have last line of every file in the result2012 folder.
Does anyone know how to do that?
Also sorry just another questions, is it also possible to show which file the particular third line belongs to?
like before the third line is shown, is it also possible to show the filename of each of the third line extracted from?
because if I used head or tail command, the filename is also shown.
thank you
With Awk, the variable FNR is the number of the "record" (line, by default) in the current file, so you can simply compare it to 3 to print the third line of each input file:
awk 'FNR == 3' MiseqData/result2012/12*
A more optimized version for long files would skip to the next file on match, since you know there's only that one line where the condition is true:
awk 'FNR == 3 { print; nextfile }' MiseqData/result2012/12*
However, not all Awks support nextfile (but it is also not exclusive to GNU Awk).
A more portable variant using your head and tail solution would be a loop in the shell:
for f in MiseqData/result2012/12*; do head -n 3 "$f" | tail -n 1; done
Or with sed (without GNU extensions, i.e., the -s argument):
for f in MiseqData/result2012/12*; do sed '3q;d' "$f"; done
edit: As for the additional question of how to print the name of each file, you need to explicitly print it for each file yourself, e.g.,
awk 'FNR == 3 { print FILENAME ": " $0; nextfile }' MiseqData/result2012/12*
for f in MiseqData/result2012/12*; do
echo -n `basename "$f"`': '
head -n 3 "$f" | tail -n 1
done
for f in MiseqData/result2012/12*; do
echo -n "$f: "
sed '3q;d' "$f"
done
With GNU sed:
sed -s -n '3p' MiseqData/result2012/12*
or shorter
sed -s '3!d' MiseqData/result2012/12*
From man sed:
-s: consider files as separate rather than as a single continuous long stream.
You can do this:
awk 'FNR==3' MiseqData/result2012/12*
If you like the file name as well:
awk 'FNR==3 {print FILENAME,$0}' MiseqData/result2012/12*
This might work for you (GNU sed & parallel):
parallel -k sed -n '3p\;3q' {} ::: file1 file2 file3
Parallel applies the sed command to each file and returns the results in order.
N.B. All files will only be read upto the 3rd line.
Also,you may be tempted (as I was) to use:
sed -ns '3p;3q' file1 file2 file3
but this will only return the first file.
Hi bro I am answering this question as we know FNR is used to check no of lines so we can run this command to get 3rd line of every file.
awk 'FNR==3' MiseqData/result2012/12*

Get the line count from 2nd line of the file

How do I get the line count of a file from the 2nd line of the file, as the first line is header?
wc -l filename
Is there a way to set some condition into it?
Use the tail command:
tail -n +2 file | wc -l
-n +2 would print the file starting from line 2
You can use awk to count from 2nd line onwards:
awk 'NR>1{c++} END {print c}' file
Or simply use NR variable in the END block:
awk 'END {print NR-1}' file
Alternatively using BASH arithmetic subtract 1 from wc output:
echo $(( $(wc -l < file) -1 ))
Delete first line with GNU sed:
sed '1d' file | wc -l
There is no way to tweak the wc command itself. You should whether process the result of the command, or use another tool.
As suggested in other answers, if you are running Bash, a good way is to put the result of the command into an arithmetic expression like $(( $(command) - 1 )).
In case if you are searching for a portable solution, here is a Perl version:
perl -e '1 while <>; print $. - 1' < file
The variable $. holds the number of lines read since a file handle was last closed. The while loop reads all the lines from the file.
Alternately, you could just subtract 2.
echo $((`cat FILE | wc -l`-2))
Please try this one. It will be solved your problem
$ tail -n +2 filename | wc -l

Find line number in a text file - without opening the file

In a very large file I need to find the position (line number) of a string, then extract the 2 lines above and below that string.
To do this right now - I launch vi, find the string, note it's line number, exit vi, then use sed to extract the lines surrounding that string.
Is there a way to streamline this process... ideally without having to run vi at all.
Maybe using grep like this:
grep -n -2 your_searched_for_string your_large_text_file
Will give you almost what you expect
-n : tells grep to print the line number
-2 : print 2 additional lines (and the wanted string, of course)
You can do
grep -C 2 yourSearch yourFile
To send it in a file, do
grep -C 2 yourSearch yourFile > result.txt
Use grep -n string file to find the line number without opening the file.
you can use cat -n to display the line numbers and then use awk to get the line number after a grep in order to extract line number:
cat -n FILE | grep WORD | awk '{print $1;}'
although grep already does what you mention if you give -C 2 (above/below 2 lines):
grep -C 2 WORD FILE
You can do it with grep -A and -B options, like this:
grep -B 2 -A 2 "searchstring" | sed 3d
grep will find the line and show two lines of context before and after, later remove the third one with sed.
If you want to automate this, simple you can do a Shell Script. You may try the following:
#!/bin/bash
VAL="your_search_keyword"
NUM1=`grep -n "$VAL" file.txt | cut -f1 -d ':'`
echo $NUM1 #show the line number of the matched keyword
MYNUMUP=$["NUM1"-1] #get above keyword
MYNUMDOWN=$["NUM1"+1] #get below keyword
sed -n "$MYNUMUP"p file.txt #display above keyword
sed -n "$MYNUMDOWN"p file.txt #display below keyword
The plus point of the script is you can change the keyword in VAL variable as you like and execute to get the needed output.

How to get ONLY Second line with SED

How can I get second line in a file using SED
#SRR005108.1 :3:1:643:216
GATTTCTGGCCCGCCGCTCGATAATACAGTAATTCC
+
IIIIII/III*IIIIIIIIII+IIIII;IIAIII%>
With the data that looks like above I want only to get
GATTTCTGGCCCGCCGCTCGATAATACAGTAATTCC
You don't really need Sed, but if the pourpose is to learn... you can use -n
n read the next input line and starts processing the newline with the command rather than the first command
sed -n 2p somefile.txt
Edit: You can also improve the performance using the tip that manatwork mentions in his comment:
sed -n '2{p;q}' somefile.txt
You always want the second line of a file? No need for SED:
head -2 file | tail -1
This will print the second line of every file:
awk 'FNR==2'
and this one only the second line of the first file:
awk 'NR==2'
This might work for you:
sed '2q;d' file
cat your_file | head -2 | tail -1

Resources