How to concat string in gnu-parallel? - string

I try to realize a tiny function that is to read users name from **fastalist ** file and to parallely create name.txt. But the spliced filenames confused me.
As shown in the below, the first one name is '1pazA' but the output is '.txtA'. It is likely the first 3 letters are replaced. However, for the second out is right.
# cmd
cat BuildFeatures/example/fastalist | parallel -j 5 echo {}.txt
# out
.txtA
T0968s1.txt
# fastalist file content
1pazA
T0968s1
I expect to get the right spliced string.

Related

how can i make the lines variable in a file?

I am using a unix based program. I want to automate the code so as not to copy and paste data one by one. For this, I need to define line-by-line data in a file as a variable for the code
The program converts xyz coordinates to local coordinates. How can I run the coordinates in the xyz_coordinates file I created, one by one, in the code below? In the program I use, the conversion code works like this:
echo 4208830.039709186 2334850.551667509 4171267.377406844 -6.753E-01 4.493E-01 2.849E-01 | xyz2env.py
and this is the file i am trying to run:
2679689.926729193 -727950.9964290063 5722789.538975053 7.873E-02 3.466E-01 6.410E-01
2679689.927123377 -727950.9971557076 5722789.540522 7.912E-02 3.458E-01 6.425E-01
2679689.930567728 -727950.9979971027 5722789.550832021 8.257E-02 3.450E-01 6.528E-01
2679689.931029495 -727950.9992263148 5722789.549927638 8.303E-02 3.438E-01 6.519E-01
2679689.929031829 -727950.9981009626 5722789.546359798 8.103E-02 3.449E-01 6.484E-01
........
and it goes on like this. Also, there are spaces between the lines. Will this be a problem?
You can use xargs to invoke the command for a specific number of arguments (6 in your case) and have the advantage of skipping empty lines automatically
< file.txt xargs -n 6 xyz2env.py

grouping input files with incremental names to divide a computation into smaller ones

I created a script to run a function that takes a specific file and compares it to several others. The files that I have to scan for the matches are more than 900 and incrementally named like this:
A001-abc.txt A001-efg.txt [..] A002-efg.txt A002-hjk.txt [..] A120-xwz.txt (no whitespaces)
The script looks like this:
#!/bin/bash
for f in *.txt; do
bedtools function /mydir/myfile.txt *.txt> output.txt
done
Now, the computation is extremely intensive and can't be completed. How can I group the files according to their incremental name and perform the computation on files A001-abc A001-def then go to A002-abc, A002-def.., to A123-xyz and so on?
I was thinking that by finding a way to specify the name with an incremental variable and then grouping them, I could divide the computation in many smaller ones. Is it reasonable and doable?
Many thanks!
The problem might be that you're not using $f. Try using this line in the for-loop instead. Also, use >> so the output appends instead of overwriting the output file.
bedtools function /mydir/myfile.txt "$f" >> output.txt
But if you still want to break it up, you might try a double loop like this:
for n in $(seq --format="%03.f" 1 120); do
for f in "A$n"*; do
bedtools function /mydir/myfile.txt "$f" >> "${n}output.txt"
done
done
The values of $n from the output of the seq command is 001 002 .. 120.
So the inner loop loops through A001* A002* .. A120*.

how to extract the first parameter from a line containing a particular string pattern

I have a file named mail_status.txt The content of the file is as follows.
1~auth_flag~
2~download_flag~
3~copy_flag~
4~auth_flag~
5~auth_flag~
6~copy_flag~
I want to perform some operation on this file so that at the end I should be getting three variables and their respective values should be as follows:
auth_flag_ids="1,4,5"
download_flag_ids="2"
copy_flag_ids="3,6"
I am quite new to this language. Please let me know if some more details are required on this.
Thanks
If you want to generate bash variables based on the file content,
please try the following:
# read the file and extract information line by line
declare -A hash # delcare hash as an associative array
while IFS= read -r line; do
key="${line#*~}" # convert "1~auth_flag~" to "auth_flag~"
key="${key%~*}_ids" # convert "auth_flag~" to "auth_flag_ids"
hash[$key]+="${line%%~*}," # append the value to the hash
done < "mail_status.txt"
# iterate over the hash to create variables
for r in "${!hash[#]}"; do # r is assigned to "auth_flag_ids", "download_flag_ids" and "copy_flag_ids" in tern
printf -v "$r" "%s" "${hash[$r]%,}" # create a variable named "$r" and assign it to the hash value by trimming the trailing comma off
done
# check the result
printf "%s=\"%s\"\n" "auth_flag_ids" "$auth_flag_ids"
printf "%s=\"%s\"\n" "download_flag_ids" "$download_flag_ids"
printf "%s=\"%s\"\n" "copy_flag_ids" "$copy_flag_ids"
First it reads the lines of the file and extracts the variable name
and the value line by line. They are stored in an associative array hash.
Next it iterates over the keys of hash to create variables whose names are
"auth_flag_ids", "download_flag_ids" and "copy_flag_ids".
printf -v var creates a variable var. This mechanism is useful to cause an
indirect reference to a variable.
I'm not going to explain in detail about the bash specific notations
such as ${parameter#word}, ${parameter%%word} or ${!name[#]}.
You can easily find the references and well-explained documents including
the bash man page.
Hope this helps.

how to create a txt file with columns being the descending sub-directories in Linux?

My data follow the structure:
../data/study_ID/FF_Number/Exam_Number/date,
Where the data dir contains 176 participants` sub-directories. The ID number represents the participants ID, and each of the following sub-directories represents some experimental number.
I want to create a txt file with one line per participants and the following columns: study ID, FF_number, Exam_Number and date.
However it gets a bit more complicated as I want to divide the participants into chunks of ~ 15-20 ppt per chunk for the following analysis.
Any suggestions?
Cheers.
Hmm, nobody?
You should redirect output of "find" command, consider switches -type d, and -maxdepth, and probably parse it with sed, replacing "/" with "spaces". Maybe piping through "cut" and "column -t" commands, and "sort" and "uniq" will be useful. Do names, except FF and ID, contain spaces, or special characters e.g. related to names of participants?
It should be possible to get a TXT with "one liner" and a few pipes.
You should try, and post first results of your work on this :)
EDIT: Alright, I created for me a structure with several thousands of directories and subdirectories numbered by participant, by exam number etc., which look like this ( maybe it's not identical with what you have, but don't worry ). Studies are numbered from 5 to 150, FF from 45 to 75, and dates from 2012_01_00 to 2012_01_30 - which makes really huge quantity of directories in total.
/Users/pwadas/bzz/data
/Users/pwadas/bzz/data/study_005
/Users/pwadas/bzz/data/study_005/05_Num
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_00
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_01
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_02
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_03
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_04
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_05
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_06
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_07
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_08
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_09
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_10
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_11
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_12
Now, I want ( quote ) "txt file with one line per participants and the following columns: study ID, FF_number, Exam_Number and date."
So I use the following one-liner:
find /Users/pwadas/bzz/data -type d | head -n 5000 |cut -d'/' -f5-7 | uniq |while read line; do echo -n "$line: " && ls -d /Users/pwadas/bzz/$line/*Exam/* | perl -0pe 's/.*2012/2012/g;s/\n/ /g' && echo ; done > out.txt
and here is the output ( a few first lines from out.txt ). Lines are very long, I cutted it on output for first 80-90 characters:
dtpwmbp:data pwadas$ cat out.txt |cut -c1-90
data:
data/study_005:
data/study_005/05_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
data/study_005/06_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
data/study_005/07_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
data/study_005/08_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
dtpwmbp:data pwadas$
I hope this will help you a little, and you'll be able to modify it according to your needs and patterns, and that seems to be all I can do :) You should analyze the one liner, especially "cut" command, and perl-regex part, which removes newlines and full directory name from "ls" output. This is probably fair from optimal, but beautifying is not the point here, I guess :)
So, good luck :)
PS. "head" command limits output for N first lines, you'll probably want to skip out
| head .. |
part.

Filename manipulation in cygwin

I am running cygwin on Windows 7. I am using a signal processing tool and basically performing alignments. I had about 1200 input files. Each file is of the format given below.
input_file_ format = "AC_XXXXXX.abc"
The first step required building some kind of indexes for all the input files, this was done with the tool's build-index command and now each file had 6 indexes associated with it. Therefore now I have about 1200*6 = 7200 index files. The indexes are of the form given below.
indexes_format = "AC_XXXXXX.abc.1",
"AC_XXXXXX.abc.2",
"AC_XXXXXX.abc.3",
"AC_XXXXXX.abc.4",
"AC_XXXXXX.abc.rev.1",
"AC_XXXXXX.abc.rev.1"
Now, I need to use these indexes to perform the alignment. All the 6 indexes of each file are called together and the final operation is done as follows.
signal-processing-tool ..\path-to-indexes\AC_XXXXXX.abc ..\Query file
Where AC_XXXXXX.abc is the index associated with that particular index file. All 6 index files are called with **AC_XXXXXX.abc*.
My problem is that I need to use only the first 14 characters of the index file names for the final operation.
When I use the code below, the alignment is not executed.
for file in indexes/*; do ./tool $file|cut -b1-14 Project/query_file; done
I'd appreciate help with this!
First of all, keep in mind that $file will always start with "indexes/", so trimming first 14 characters would always include that folder name in the beginning.
To use first 14 characters in a variable, use ${file:0:14}, where 0 is the starting string index, and 14 is the length of the desired substring.
Alternatively, if you want to use cut, you need to run it in a subshell: for file in indexes/*; do ./tool $(echo $file|cut -c 1-14) Project/query_file; done I changed the arg for cut to -c for characters instead of bytes

Resources