Adapt command to creating csv file from storage content including date(time) & file size also - linux

According to thread:
Linux: fast creating of formatted output file (csv) from find command
there is a suggested bash command, including awk (which I don't understand):
find /mnt/sda2/ | awk 'BEGIN{FS=OFS="/"}!/.cache/ {$2=$3=""; new=sprintf("%s",$0);gsub(/^\/\/\//,"",new); printf "05;%s;/%s\n",$NF,new }' > $p1"Seagate-4TB-S2-BTRFS-1TB-Dateien-Verzeichnisse.csv"
With this command, I am able to create a csv file containing "05;file name;full path and file name" of the directory and file content of my device mounted on /mnt/sda2. Thanks again to -> tink
How must I adapt the above command to receive date(&time) and file size also?
Thank you in advance,
-Linuxfluesterer

Related

Query of the execution step of shell command `ls > list`

root#VM-0-11-debian:~/linux/2023/01# ls
root#VM-0-11-debian:~/linux/2023/01# ls > list
root#VM-0-11-debian:~/linux/2023/01# ls
list
root#VM-0-11-debian:~/linux/2023/01# cat list
list
I know that > will redirect stdout to file. it will create the file if not present, otherwise replace it.
I would like to ask that is the shell command ls > list implementation process as I described below?
1)As the file named list not exists, so create a file named list first.
2)ls command will list the directory content(list). the content listed(list) will be in the standard output.
3)Add the content of the standard output(list) to the file named list in a replaced way.
My personal understanding of the implementation process as described above, I hope you can give me some guidance. Thank you.
The file redirection operator > is handled by your shell and any file to which you write will be created/truncated before the binary is started. That's why you can see the file name list in the content of the file: the file has already been created before the ls process was started.
So yes, your understanding is correct.
This is why it is not possible to do something like sort txt > txt – the file txt will be truncated before sort reads it. You will end up with an empty file.

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

Filtering text files in cmd?

Is there any way that one can filter a text file in Windows' CMD as with awk in shell script?
I have a somehow large file and I only need the last column from each row. This will be done extremely easy with awk, but I have no means of using that now.
Try this our
Get-Content .\test.csv | %{ $_.Split(',')[1]; }
or for more reference
check out this site
[1]: http://windows-powershell-scripts.blogspot.in/2009/06/awk-equivalent-in-windows-powershell.html
This will return every last term after the last comma in a .csv file for example:
#echo off
type "file.csv" | repl ".*,(.*)" "$1" >"newfile.txt"
This uses a helper batch file called repl.bat (by dbenham) - download from: https://www.dropbox.com/s/qidqwztmetbvklt/repl.bat
Place repl.bat in the same folder as the batch file or in a folder that is on the path.

Piping with multiple commands

Assume you have a file called “heading” as follows
echo "Permissions^V<TAB>^V<TAB>Size^V<TAB>^V<TAB>File Name" > heading
echo "-------------------------------------------------------" >> heading
Write a (single) set of commands that will create a report as follows:
make a list of the names, permissions and size of all the files in your current directory,
matching (roughly) the format of the heading you just created,
put the list of files directly following the heading, and
save it all into a file called “file.list”.
All this is to be done without destroying the heading file.
I need to be able to do this all in a pipleline without altering the file. I can't seem to do this without destroying the file. Can somebody please make a pipe for me?
You can use command group:
{ cat heading; ls -l | sed 's/:/^V<tab>^V<tab>/g'; } > file.list

"Unable to open image" error when using ImageMagick's Filename References

I'm using ImageMagick to do some image processing from the commandline, and would like to operate on a list of files as specified in foo.txt. From the instructions here: http://www.imagemagick.org/script/command-line-processing.php I see that I can use Filename References from a file prefixed with #. When I run something like:
montage #foo.txt output.jpg
everything works as expected, as long as foo.txt is in the current directory. However, when I try to access bar.txt in a different directory by running:
montage /some_directory/#bar.txt
output2.jpg
I get:
montage: unable to open image
/some_directory/#bar.txt: No such file
or directory # blob.c/OpenBlob/2480.
I believe the issue is my syntax, but I'm not sure what to change it to. Any help would be appreciated.
Quite an old entry but it seems relatively obvious that you need to put the # before the full path:
montage #/some_directory/bar.txt output2.jpg
As of ImageMagick 6.5.4-7 2014-02-10, paths are not supported with # syntax. The # file must be in the current directory and identified by name only.
I haven't tried directing IM to pull the list of files from a file, but I do specify multiple files on the command line like this:
gm -sOutputFile=dest.ext -f file1.ppm file2.ppm file3.ppm
Can you pull the contents of that file into a variable, and then let the shell expand that variable?

Resources