How to write a Linux script to run multiple files from a single executable? - linux

I have an executable which works on a file & produces another file. I need to provide the first entry of the file as an argument of the executable. Suppose the executable name is "myexec" and I am trying to run a file "myfile.extension"
"myfile.extension" has a format like this:
7 4 9 1 4 11 9 2 33 4 7 1 22 4 55 ...
While running the executable, I have to type the following:
myexec 7 myfile.extension
and it produces a file named myfile.extension.7
My question is that how can I write a script that will do this for a bunch of files in a directory?

Here's a bash script that you can execute in the directory with the files. It assumes the first word in the file is the argument:
for f in *
do
i=$(awk 'NR==1{print $1;exit}' $f)
myexec $i myfile.extension
done
Edit: Using awk instead of cut | head. Mentioned by brianadams in the comments.

Related

abyss-pe: variables to assemble multiple genomes with one command

How do I rewrite the following to properly replace the variable with my genomeID? (I have it working with this method in Spades and Masurca assemblers, so it's something about Abyss that doesnt like this approach and I need a work-around)
I am trying to run abyss on a cluster server but am running into trouble with how abyss-pe is reading my variable input:
my submit file loads a script for each genome listed in a .txt file
my script writes in the genome name throughout the script
the abyss assembly fumbles the variable replacement
Input.sub:
queue genomeID from genomelisttest.txt
Input.sh:
#!/bin/bash
genomeID=$1
cp /mnt/gluster/harrow2/trim_output/${genomeID}_trim.tar.gz ./
tar -xzf ${genomeID}_trim.tar.gz
rm ${genomeID}_trim.tar.gz
for k in `seq 86 10 126`; do
mkdir k$k
abyss-pe -C k$k name=${genomeID} k=$k lib='pe1 pe2' pe1='../${genomeID}_trim/${genomeID}_L1_1.fq.gz ../${genomeID}_trim/${genomeID}_L1_2.fq.gz' pe2='../${genomeID}_trim/${genomeID}_L2_1.fq.gz ../${genomeID}_trim/${genomeID}_L2_2.fq.gz'
done
Error that I get:
`../enome_trim/enome_L1_1.fq.gz': No such file or directory
This is where "enome" is supposed to replace with a five digit genomeID, which happens properly in the earlier part of the script up to this point, where abyss comes in.
pe1='../'"$genomeID"'_trim/'"$genomeID"'_L1_1.fq.gz ...'
I added a single quote before and after the variable

Add blank line before a certain phrase in a text file in Linux?

I'm using Kali Linux, trying to sort out some input from Nmap. Basically, I ran a scan from NMap, and need to extract specific pieces of information from it. I've got it to show everything I need using the following command:
cat discovery.txt | grep 'Nmap scan report for\|Service Info: OS:\|OS CPE:\|OS guesses:\|OS matches\|OS details'
Essentially, each section of information I need will start with "Nmap scan report for [IP ADDRESS]"
I'd like to add to my command to have it create a blank line before every appearance of the word "Nmap", to clearly separate each chunk of information.
Is there any command I can use to do this?
sed '/Nmap/i
' file
That's a literal newline after the i
A demo: add a newline before each line ending with a "0" or a "5"
seq 19 | sed '/0$\|5$/i
'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Sure, you can use Perl.
perl -pe 's/^Nmap/\nNmap/'

Integrating several shell scripts into one script

I would like to integrate a few short scripts into one script where I can update an argument for the input file from the command line. I am going through 22 files and counting lines where $5!="1".
Here is a sample head of the input file:
Currently, I have the following 3 short scripts:
CHROM POS N_ALLELES N_CHR {FREQ}
2 45895 2 162 0.993827 0.00617284
2 45953 2 162 0.993827 0.00617284
2 264985 2 162 1 0
2 272051 2 162 0.944444 0.0555556
1) count lines (saved as wcYRI.sh): $5!="1"{sum++}END{print sum}
2) apply linecount (saved as check-annos.sh): awk -f wcYRI.sh ~/folder$1/file$1
3) apply linecount for 22 files, sum the output:
for i in {1..22};
do sh check-annos.sh $i; done
| awk '{sum+=$1}END{print sum}'
Its relatively simple, but sometimes script 1 gets a little longer for data files that look like this:
Chr Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene LJB2_SIFT LJB2_PolyPhen2_HDIV LJB2_PP2_HDIV_Pred LJB2_PolyPhen2_HVAR LJB2_PolyPhen2_HVAR_Pred LJB2_LRT LJB2_LRT_Pred LJB2_MutationTaster LJB2_MutationTaster_Pred LJB_MutationAssessor LJB_MutationAssessor_Pred LJB2_FATHMM LJB2_GERP++ LJB2_PhyloP LJB2_SiPhy
16 101593 101593 C T exonic POLR3K nonsynonymous SNV POLR3K:NM_016310:exon2:c.G164A:p.G55E 0.000000 0.997 D 0.913 D 0.000000 D 0.999989 D 2.205 medium 0.99 5.3 2.477000 17.524
...and I am using an awk file like this (performing an array match) as input -f to script 2 above:
NR==FNR{
arr[$1$2];next
}
$1$2 in arr && $0~/exonic/&&/nonsynonymous SNV/{nonsyn++};
$1$2 in arr && $0~/exonic/&&/synonymous SNV/ && $0!~/nonsynonymous/{syn++}
END{
print nonsyn,"nonsyn YRI","\t",syn,"YRI syn"
}
My goal is to integrate this process a bit more so I don't need to go into script 2 and change the ~/folder$1/file$1 each time-- I'd like to be able to use ~/folder$1/file$1 as an input at the command line. However when I try to use something like this in a for-loop at the command line, it doesn't accept $1 the way it does when $1 is built into a separate script being called by the for-do-done loop (as in script 3 --i.e. script 3 will take script 2, but I can't just enter the contents of script 2 explicitly into the for-loop as an argument(s)).
I am actually not so concerned about having a separate AWK file to handle the line parsing, the main thing annoying me is that I am modifying script 2 for each folder/file set, and I would like to be able to do this from the command line so that the script knows when I tell it ~/folder$1/file$1, to cycle through numbers 1-22 and I so can save one universal script for this process, since I have many folder/file combinations to look at.
Any advice is appreciated for shortening the pipeline in general, but specifically the command line argument problem is bugging me a lot!
If I understand the problem correctly, I see two ways to handle it. If the path format is consistent (i.e. the number always occurs twice, in the same positions), you could make the script accept the parts of the path as two different parameters. The script would look like this:
#!/bin/bash
folderPrefix="$1"
filePrefix="$2"
for num in {1..22}; do
awk -f wcYRI.sh "$folderPrefix$num/$filePrefix$num"
done |
awk '{sum+=$1}END{print sum}'
... and then you'd run it with ./scriptname ~/folder file. Alternately, if you need to be able to define the folder/file path format more flexibly, you could do something like this:
#!/bin/bash
for num in {1..22}; do
eval "awk -f wcYRI.sh $1"
done |
awk '{sum+=$1}END{print sum}'
... and then run it with ./scriptname '~/folder$num/file$num'. Note that the single-quotes are needed here so that the $var references don't get expanded until eval forces them to be.
BTW, the file wcYRI.sh is an awk script, not a shell script, so I'd recommend changing its file extension to prevent confusion. Actually, the preferred way to do this (for both shell and awk scripts) is to add a shebang line as the first line in the script (see my examples above; for an awk script it would be #!/usr/bin/awk -f), then make the script executable, and then run it with just ./scriptname and let the shebang take care of specifying the interpreter (sh, bash, awk -f, whatever).

IO redirection in Linux

I'm executing a c file and writing to the output to a file in the following format.
./test >> foo.csv
12245933 1
46909645052845 2
46909767324372 3
However, when I run it without writing to a file, the output is the following.
./test
12245933 1
46909645052845 2
134517460 3
Note the information i'm dumping is a uint64.
what is wrong? what is the mistake i'm doing?
am i using the wrong io redirection?

how to redirect `cat` to simulate user input in linux

I began a project. In the instruction, it is written that we could test our program with this command line :
cat test.txt > test.py
But I have no output.
As I understood, it is supposed to give me an output.
test.txt file looks like :
1
3
4
2
5
6
7
1
1
8
9
3
4
5
1
-1
And the test.py file looks like :
for i in range(16):
var=raw_input("type something : ")
print var
I was excepting this command line to redirect the content of the test.txt file to the test.py file while it was running.
I have already read the documentation about the cat command.
Could you help me please ?
In other words, how the cat command is supposed to simulate the user ? I think I have to change something in my python file.
Thank in advance,
Mff
The problem here is that you want cat test.txt | test.py rather than >. | sends the output of one command (cat test.txt) to the input of the other (test.py) whereas > sends the output to a file (which probably means you've overwritten test.py with the contents of test.txt).

Resources