"Cat" into multiple files using brace expansion - linux

I am quite new to bash and trying to type some text into multiple files with a single command using brace expansion.
I tried: cat > file_{1..100} to write into 100 files some text that I will type in the terminal. I get the following error:
bash: file_{1..100}: ambiguous redirect
I also tried: cat > "file_{1..100}" but that creates a singe file named: file_{1..100}.
I tried: cat > `file_{1..100}` but that gives the error:
file_1: command not found
How can I achieve this using brace expansion? Maybe there are other ways using other utilities and/or pipelines. But I want to know if that is possible using only simple brace expansion or not.

You can't do this with cat alone. It only writes its output to its standard output, and that single file descriptor can only be associated with a single file.
You can however do it with tee file_{1..100}.
You may wish to consider using tee file_{01..100} instead, so that the filenames are zero-padded to all have the same width: file_001, file_002, ... This has the advantage that lexicographic order will agree with numerical order, and so ls, *, etc, will process them in numerical order. Without this, you have the situation that file_2 comes after file_10 in lexicographic order.

target could be only a pipe, not a multiple files.
If you want redirect output to multiple files, use tee
cat | tee file_{1..100}
Don't forget to check man tee, for example if you want to append to the files, you should add -a option (tee -a file_{1..100})

This types the string or text into file{1..4}
echo "hello you just knew me by kruz" > file{1..4}
Use to remove them
rm file*

Related

Is it possible to display a file's contents and delete that file in the same command?

I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example,  /dev/stdin, /dev/stdout,  and  /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.

balancing the bash calculations

We have a tool for cutting adaptors https://github.com/vsbuffalo/scythe/blob/master/README.md and we wanted it to be used on all the files in the raw folder and make an output of each file separately as OUT+File Name.
Something is wrong with this script I wrote, because it doesn't take each file separately, and the whole thing doesn't work properly. It's gonna generateing empty file named OUT+files
Expected operation will looks:
take file1, use scythe on it, write output as OUTfile1
take file2 etc.
#!/bin/bash
FILES=/home/dave/raw/*
for f in $FILES
do
echo "Processing the $f file..."
/home/deve/scythe/scythe -a /home/dev/scythe/illumina_adapters.fa -o "OUT"+$f $f
done
Additionally, I noticed (testing for a single file) that the script uses only one core out of 130 available. Is there any way to improve it?
There is no string concatenation operator in shell. Use juxtaposition instead; it's "OUT$f", not "OUT"+$f.

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

Pick the specific file in the folder

I want pick the specific format of file among the list of files in a directory. Please find the below example.
I have a below list of files (6 files).
Set-1
1) MAG_L_NT_AA_SUM_2017_01_20.dat
2) MAG_L_NT_AA_2017_01_20.dat
Set-2
1) MAG_L_NT_BB_SUM_2017_01_20.dat
2) MAG_L_NT_BB_2017_01_20.dat
Set-3
1) MAG_L_NT_CC_SUM_2017_01_20.dat
2) MAG_L_NT_CC_2017_01_20.dat
From the above three sets I need only 3 files.
1) MAG_L_NT_AA_2017_01_20.dat
2) MAG_L_NT_BB_2017_01_20.dat
3) MAG_L_NT_CC_2017_01_20.dat
Note: There can be multiple lines of commands because i have create the script for above req. Thanks
Probably easiest and least complex solution to your problem is combining find (a tool for searching for files in a directory hierarchy) and grep (tool for printing lines that match a pattern). You also can read those tools manuals by typing man find and man grep.
Before going straight to solution we need to understand, how we will approach your problem. To find pattern in a name of file we search we will use find command with option -name:
-name pattern
Base of file name (the path with the leading directories removed) matches shell pattern pattern. The metacharacters ('*', '?', and '[]')
match a '.' at the start of the base name (this is a change in
findutils-4.2.2; see section STANDARDS CONFORMANCE below). To ignore a
directory and the files under it, use -prune; see an example in the
description of -path. Braces are not recognised as being special,
despite the fact that some shells including Bash imbue braces with a
special meaning in shell patterns. The filename matching is performed
with the use of the fnmatch(3) library function. Don't forget to
enclose the pattern in quotes in order to protect it from expansion by
the shell.
For instance, if we want to search for a file containing string 'abc' in directory called 'words_directory', we will enter following:
$ find words_directory -name "*abc*"
And if we want to search all directories in directory:
$ find words_directory/* -name "*abc*"
So first, we will need to find all files, which begin with string "MAG_L_NT_" and end with ".dat", therefore to find all matching names in /your/specified/path/ which contains many subdirectories, which could contain files that match this pattern:
$ find /your/specified/path/* -name "MAG_L_NT_*.dat"
However this prints all found filenames, but we still get names containing "SUM" string, there comes in grep. To exclude names containing unwanted string we will use option -v:
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v is
specified by POSIX .)
To use grep to filter out first commands output we will use pipe () |:
The standard shell syntax for pipelines is to list multiple commands,
separated by vertical bars ("pipes" in common Unix verbiage). For
example, to list files in the current directory (ls), retain only the
lines of ls output containing the string "key" (grep), and view the
result in a scrolling page (less), a user types the following into the
command line of a terminal:
ls -l | grep key | less
"ls -l" produces a process, the output (stdout) of which is piped to
the input (stdin) of the process for "grep key"; and likewise for the
process for "less". Each process takes input from the previous process
and produces output for the next process via standard streams. Each
"|" tells the shell to connect the standard output of the command on
the left to the standard input of the command on the right by an
inter-process communication mechanism called an (anonymous) pipe,
implemented in the operating system. Pipes are unidirectional; data
flows through the pipeline from left to right.
process1 | process2 | process3
After you got acquainted to mentioned commands and options which will be used to achieve your goal, you are ready for solution:
$ find /your/specified/path/* -name "MAG_L_NT_*.dat" | grep -v "SUM"
This command will produce output of all names which begin "MAG_L_NT_" and end with ".dat". grep -v will use first command output as input and remove all lines containing "SUM" string.

storing output of ls command consisting of files with spaces in their names

I want to store output of ls command in my bash script in a variable and use each file name in a loop, but for example one file in the directory has name "Hello world", when I do variable=$(ls) "Hello" and "world" end up as two separate entries, and when I try to do
for i in $variable
do
mv $i ~
done
it shows error that files "Hello" and "world" doesn't exist.
Is there any way I can access all files in current directory and run some command even if the files have space(s) in their names.
If you must, dirfiles=(/path/of/interest/*).
And accept the admonition against parsing the output of ls!
I understand you are new to this and I'd like to help. But it isn't easy for me (us?) to provide you with an answer that would be of much help to you by the way you've stated your question.
Based on what I hear so far, you don't seem to have a basic understanding on how parameter expansions work in the shell. The following two links will be useful to you:
Matching Pathnames, Parameters
Now, if your task at hand is to operate on files meeting certain criteria then find(1) will likely to do the job.
Say it with me: don't parse the output of ls! For more information, see this post on Unix.SE.
A better way of doing this is:
for i in *
do
mv -- "$i" ~
done
or simply
mv -- * ~

Resources