In CMake how do turn a multi line output of a command into a list? - string

I want to do do something like this
execute_process(
COMMAND bash -c "git --git-dir ${CMAKE_SOURCE_DIR}/.git ls-files"
OUTPUT_VARIABLE TRACKED_FILES)
add_custom_target(all_file_project SOURCES ${TRACKED_FILES})
And the command itself seems to work as expected but the generated variable "TRACKED_FILES" contains only one logical entry (one multi line string) rather than a list of files.
Can I somehow turn a string containing multiple lines separated by a newline ("\n") into a list in CMake?

One option (as the title of my question suggests) is to actively split the string manually rather than interpreting a variable as list in the first place:
string(REPLACE "\n" ";" ADDITIONAL_PROJECT_FILES_LIST ${ADDITIONAL_PROJECT_FILES})
This works for me but it would be very nice to have something more abstract and less platform specific (e.g. I don't know whether this works on all OSes including Windows)
Something like execute_process(COMMAND find -type f OUTPUT_LIST_VARIABLE MY_LIST)
Or at least set(MY_LIST FROM_MULTILINE MY_MULTILINE_STRING)

Related

concatenate two strings and one variable using bash

I need to generate filename from three parts, two strings, and one variable.
for f in `cat files.csv`; do echo fastq/$f\_1.fastq.gze; done
files.csv has the following lines:
Sample_11
Sample_12
I need to generate the following:
fastq/Sample_11_1.fastq.gze
fastq/Sample_12_1.fastq.gze
My problem is that I got the below files:
_1.fastq.gze_11
_1.fastq.gze_12
the string after the variable deletes the string before it.
I appreciate any help
Regards
By the way your idiom: for f in cat files.csv should be avoid. Refer: Dangerous Backticks
while read f
do
echo "fastq/${f}/_1.fastq.gze"
done < files.csv
You can make it a one-liner with xargs and printf.
xargs printf 'fastq/%s_1.fastq.gze\n' <files.csv
The function of printf is to apply the first argument (the format string) to each argument in turn.
xargs says to run this command on as many files as it can fit onto the command line (splitting it up into multiple invocations if the input file is too large to fit all the arguments onto a single command line, subject to the ARG_MAX constant in your kernel).
Your best bet, generally, is to wrap the variable name in braces. So, in this case:
echo fastq/${f}_1.fastq.gz
See this answer for some details about the general concept, as well.
Edit: An additional thought looking at the now-provided output makes me think that this isn't a coding problem at all, but rather a conflict between line-endings and the terminal/console program.
Specifically, if the CSV file ends its lines with just a carriage return (ASCII/Unicode 13), the end of Sample_11 might "rewind" the line to the start and overwrite.
In that case, based loosely on this article, I'd recommend replacing cat (if you understandably don't want to re-architect the actual script with something like while) with something that will strip the carriage returns, such as:
for f in $(tr -cd '\011\012\040-\176' < temp.csv)
do
echo fastq/${f}_1.fastq.gze
done
As the cited article explains, Octal 11 is a tab, 12 a line feed, and 40-176 are typeable characters (Unicode will require more thinking). If there aren't any line feeds in the file, for some reason, you probably want to replace that with tr '\015' '\012', which will convert the carriage returns to line feeds.
Of course, at that point, better is to find whatever produces the file and ask them to put reasonable line-endings into their file...

find string and replace

Hi I have a file like this
L_00001_mRNA_interferase_MazF
ATGGATTATCCAAAACAAAAGGATATTGTCTGGATTGATTTTGACCCTTCTAAAGGCAAA
GAGATAAGAAAGCGGAGACCTGCGTTAGTAGTTAGTAAAGATGAATTTAATGAACGTACA
GGTTTCTGTTTAGTTTGCCCCATCACATCTACTAAAAGGAACTTTGCAACGTATATTGAA
ATAACAGACCCACAGAAAGTAGAAGGGGACGTAGTTACCCATCAATTGCGAGCGGTTGAT
TACACCACAAGAAATATCGAAAAAATTGAACAATGTGATATGTTGACGTGGATTGATGTA
GTAGAAGTAATCGGAATGTTTATTTAA
L_00002_hypothetical_protein
ATGGAAACGGTAGTTAGAAAGATAGGGAATTCAGTAGGAACTATTTTTCCGAAAAGTATT
TCACCACAAGTTGGAGAAAAGTTCACTATTCTTAAAGTTGGGGAAGCGTATATATTGAAA
CCTAAGAGAGAAGATATTTTTAAAAATGCTGAAGATTGGGTAGGGTTTAGAGAAGCTTTG
ACTAATGAAGATAAAGAATGGGACGAGATGAAACTTGAGGGAGGAGAACGCTAG
L_00003_hypothetical_protein
ATGACAACGTTTGGAGAAATTCATAGCAATGCAGAAGGTTATAAAAACGATTTTAATGAG
TTGAATAAATTAGTATTACGTGTAGCTGAAGAAAAAGCAAAAGGAGAGCCATTAGTAACG
TGGTTTCGGTTGCGGAATCGTAGGATTGCACAAGTATTAGACCCAATGAAAGAAGAAGTA
GAAAGTAAATCAAAGTACGAAAAAAGAAGAGTAGCAGCAATTAGTAAAAGCTTTTTTCTA
CTTAAAAAAGCTTTTAACTTTATTGAAGCAGAACAATTTGAAAAAGCAGAAAAATTAATT
I would like to substitute the header of each sequence with a string.
I have a conversion file like
L_00001_mRNA_interferase_MazF galM,GALM,aldose1-epimerase[EC:5.1.3.3]
L_00002_hypothetical_protein E3.2.1.85,lacG,6-phospho-beta-galactosidase[EC:3.2.1.85]
L_00003_hypothetical_protein PTS-Lac-EIIB,lacE,PTSsystem,lactose-specificIIBcomponent[EC:2.7.1.69]
Your question is unclear as to what platform you're on (Windows, Linux, Mac, ...), what languages you're constrained to, and the exact details of your input files.
On the assumption that you're on Linux, or otherwise have sed and awk available and a command shell, it could be as simple as (where $ indicates a Bourne-like shell prompt):
$ awk '{print "s/^" $1 "/" $2 "/"}' conversions.txt > conversions.sed
$ sed -f conversions.sed sequences.txt > relabeled.txt
This assumes that your first file (with the headings you want changed) is called sequences.txt and your second file (the “conversion file”) is called conversions.txt. It is further assumed that the “conversion file” contains one record per line with exactly two fields — the original and substitute headers — separated by whitespace (i.e. neither the original header nor the new header contain any spaces) and no blank lines.
In this solution, the first (awk) line converts the conversions.txt file into a sed script, conversions.sed; the second (sed) line then runs this script on the sequences.txt file, producing the relabeled.txt file, which may (or may not) be what you're looking for.
Depending on the exact nature of your input files, which isn't clear from your question, this may need a bit of tweaking.
Hope this helps.

Combining part of bash parameters into a string

Alright, so I'm trying to combine some but not all of my script's parameters into one string. I'm trying to write a script that changes spaces in a file name to underscores, and when the option -r is given, it recursively does it to every file in the folder.
Assuming the file is saved as removespaces.sh, if you run removespaces.sh file with spaces.doc it doesn't really have to care about parameters, I can just use $*
but, when I'm trying to do it for an entire folder I now have -r as $1. So I can't just (be lazy) use $*.. how could I create a string that's equal to $2 to end?
A string of $2 to the end of the parameters:
"${*:2}"
This differs from "${#:2}" in that it concatenates all the arguments, with one space between each. In general, it is possible that neither form is what you want (if, for example, you have files with more than one consecutive space in their name).

Copy a section within two keywords into a target file

I have thousand of files in a directory and each file contains numbers of defined variables starting with keyword DEFINE and ending with a semicolon (;), I want to copy all the occurrences of the data between this keyword(Inclusive) into a target file.
Example: Below is the content of the text file:
/* This code is for lookup */
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
END.
Now from the above content i just want to copy the section starting with DEFINE and ending with ; into a target file i.e. the output should be:
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
this needs to done for thousands of scripts and multiple occurences, Please help out.
Thanks a lot , the provided code works, but to a limited extent only when the whole sentence is in a single line but the data is not supposed to be in one single line it is spread in multiple line like below:
/* This code is for lookup */
DEFINE variable as a1 expr= if branchno > 55
then
extract (n123f1 using brach, code)
else
branchno = null
;
END.
The code is also in the above fashion i need to capture all the data between DEFINE and semicolon (;) after every define there will be an ending semicolon ;, this is the pattern.
It sounds like you want grep(1):
grep '^DEFINE.*;$' input > output
Try using grep. Let's say you have files with extension .txt in present directory,
grep -ho 'DEFINE.*;' *.txt > outfile
Output:
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
Short Description
-o will give you only matching string rather than whole line, if line also contains something else and want to ommit it.
-h will suppress file names before matching result
Read man page of grep by typing man grep on your terminal
EDIT
If you want capability to search in multiple lines, you can use pcregrep with -M option
pcregrep -M 'DEFINE.*?(\n|.)*?;' *.txt > outfile
Works fine on my system. Check man pcregrep for more details
Reference : SO Question
One can make a simple solution using sed with version :
sed -n -e '/^DEFINE/{:a p;/;$/!{n;ba}}' your-file
Option -n prevents sed from printing every line; then each time a line begins with DEFINE, print the line (command p) then enter a loop: until you find a line ending with ;, grab the next line and loop to the print command. When exiting the loop, you do nothing.
It looks a bit dirty; it seems that the version sed15 has a shorter (and more straightforward) way to achieve this in one line:
sed -n -e '/^DEFINE/,/;$/p' your-file
Indeed, only for this version of sed, both patterns are treated; for other versions of sed like mine under cygwin, the range patterns must be on separate lines to work properly.
One last thing to remember: it does not treat inclusive patterned ranges, i.e. it stops printing after the first encountered end-pattern even if multiple start patterns have been matched. Prefer something with awk if this is a feature you are looking for.

How do you pass on filenames to other programs correctly in bash scripts?

What idiom should one use in Bash scripts (no Perl, Python, and such please) to build up a command line for another program out of the script's arguments while handling filenames correctly?
By correctly, I mean handling filenames with spaces or odd characters without inadvertently causing the other program to handle them as separate arguments (or, in the case of < or > — which are, after all, valid if unfortunate filename characters if properly escaped — doing something even worse).
Here's a made-up example of what I mean, in a form that doesn't handle filenames correctly: Let's assume this script (foo) builds up a command line for a command (bar, assumed to be in the path) by taking all of foo's input arguments and moving anything that looks like a flag to the front, and then invoking bar:
#!/bin/bash
# This is clearly wrong
FILES=
FLAGS=
for ARG in "$#"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags string
FLAGS="$FLAGS $ARG"
else
# Looks like a file, add it to the files string
FILES="$FILES $ARG"
fi
done
# Call bar with the flags and files (we don't care that they'll
# have an extra space or two)
CMD="bar $FLAGS $FILES"
echo "Issuing: $CMD"
$CMD
(Note that this just an example; there are lots of other times one needs to do this and that to a bunch of args and then pass them onto other programs.)
In a naive scenario with simple filenames, that works great. But if we assume a directory containing the files
one
two
three and a half
four < five
then of course the command foo * fails miserably in its task:
foo: Handling four < five
foo: Handling one
foo: Handling three and a half
foo: Handling two
Issuing: bar four < five one three and a half two
If we actually allow foo to issue that command, well, the results won't be what we're expecting.
Previously I've tried to handle this through the simple expedient of ensuring that there are quotes around each filename, but I've (very) quickly learned that that is not the correct approach. :-)
So what is? Constraints:
I want to keep the idiom as simple as possible (not least so I can remember it).
I'm looking for a general-purpose idiom, hence my making up the bar program and the contrived example above instead of using a real scenario where people might easily (and reasonably) go down the route of trying to use features in the target program.
I want to stick to Bash script, I don't want to call out to Perl, Python, etc.
I'm fine with relying on (other) standard *nix utilities, like xargs, sed, or tr provided we don't get too obtuse (see #1 above). (Apologies to Perl, Python, etc. programmers who think #3 and #4 combine to draw an arbitrary distinction.)
If it matters, the target program might also be a Bash script, or might not. I wouldn't expect it to matter...
I don't just want to handle spaces, I want to handle weird characters correctly as well.
I'm not bothered if it doesn't handle filenames with embedded nul characters (literally character code 0). If someone's managed to create one in their filesystem, I'm not worried about handling it, they've tried really hard to mess things up.
Thanks in advance, folks.
Edit: Ignacio Vazquez-Abrams pointed me to Bash FAQ entry #50, which after some reading and experimentation seems to indicate that one way is to use Bash arrays:
#!/bin/bash
# This appears to work, using Bash arrays
# Start with blank arrays
FILES=()
FLAGS=()
for ARG in "$#"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags array
FLAGS+=("$ARG")
else
# Looks like a file, add it to the files array
FILES+=("$ARG")
fi
done
# Call bar with the flags and files
echo "Issuing (but properly delimited, not exactly as this appears): bar ${FLAGS[#]} ${FILES[#]}"
bar "${FLAGS[#]}" "${FILES[#]}"
Is that correct and reasonable? Or am I relying on something environmental above that will bite me later. It seems to work and it ticks all the other boxes for me (simple, easy to remember, etc.). It does appear to rely on a relatively recent Bash feature (FAQ entry #50 mentions v3.1, but I wasn't sure whether that was arrays in general of some of the syntax they were using with it), but I think it's likely I'll only be dealing with versions that have it.
(If the above is correct and you want to un-delete your answer, Ignacio, I'll accept it provided I haven't accepted any others yet, although I stand by my statement about link-only answers.)
Why do you want to "build up" a command? Add the files and flags to arrays using proper
quoting and issue the command directly using the quoted arrays as arguments.
Selected lines from your script (omitting unchanged ones):
if [[ ${ARG:0:1} == - ]]; then # using a Bash idiom
FLAGS+=("$ARG") # add an element to an array
FILES+=("$ARG")
echo "Issuing: bar \"${FLAGS[#]}\" \"${FILES[#]}\""
bar "${FLAGS[#]}" "${FILES[#]}"
For a quick demo of using arrays in this manner:
$ a=(aaa 'bbb ccc' ddd); for arg in "${a[#]}"; do echo "..${arg}.."; done
Output:
..aaa..
..bbb ccc..
..ddd..
Please see BashFAQ/050 regarding putting commands in variables. The reason that your script doesn't work is because there's no way to quote the arguments within a quoted string. If you were to put quotes there, they would be considered part of the string itself instead of as delimiters. With the arguments left unquoted, word splitting is done and arguments that include spaces are seen as more than one argument. Arguments with "<", ">" or "|" are not a problem in any case since redirection and piping is performed before variable expansion so they are seen as characters in a string.
By putting the arguments (filenames) in an array, spaces, newlines, etc., are preserved. By quoting the array variable when it's passed as an argument, they are preserved on the way to the consuming program.
Some additional notes:
Use lowercase (or mixed case) variable names to reduce the chance that they will collide with the shell's builtin variables.
If you use single square brackets for conditionals in any modern shell, the archaic "x" idiom is no longer necessary if you quote the variables (see my answer here). However, in Bash, use double brackets. They provide additional features (see my answer here).
Use getopts as Let_Me_Be suggested. Your script, though I know it's only an example, will not be able to handle switches that take arguments.
This for ARG in "$#" can be shortened to this for ARG (but I prefer the readability of the more explicit version).
See BashFAQ #50 (and also maybe #35 on option parsing). For the scenario you describe, where you're building a command dynamically, the best option is to use arrays rather than simple strings, as they won't lose track of where the word boundaries are. The general rules are: to create an array, instead of VAR="foo bar baz", use VAR=("foo" "bar" "baz"); to use the array, instead of $VAR, use "${VAR[#]}". Here's a working version of your example script using this method:
#!/bin/bash
# This is clearly wrong
FILES=()
FLAGS=()
for ARG in "$#"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags array
FLAGS=("${FLAGS[#]}" "$ARG") # FLAGS+=("$ARG") would also work in bash 3.1+, as Dennis pointed out
else
# Looks like a file, add it to the files string
FILES=("${FILES[#]}" "$ARG")
fi
done
# Call bar with the flags and files (we don't care that they'll
# have an extra space or two)
CMD=("bar" "${FLAGS[#]}" "${FILES[#]}")
echo "Issuing: ${CMD[*]}"
"${CMD[#]}"
Note that in the echo command I used "${VAR[*]}" instead of the [#] form because there's no need/point to preserving word breaks here. If you wanted to print/record the command in unambiguous form, this would be a lot messier.
Also, this gives you no way to build up redirections or other special shell options in the built command -- if you add >outfile to the FILES array, it'll be treated as just another command argument, not a shell redirection. If you need to programmatically build these, be prepared for headaches.
getopts should be able to handle spaces in arguments correctly ("file name.txt"). Weird characters should work as well, assuming they are correctly escaped (ls -b).

Resources