I am writing a picture editing program and am using the below snippet to choose the files:
$var = FileOpenDialog("",#DesktopDir,"Images (*.jpg;*.bmp;*.png)",1+4)
$var = StringReplace($var, "|", #CRLF)
When I select multiple files all the file names are stored in $var separated by the | symbol. I replace that symbol with a newline character. But I need to run the program for all the filenames and I can't figure out how to separate the various filenames from the variable. So my programs stops if I select multiple files.
$var = FileOpenDialog("", #DesktopDir, "Images (*.jpg;*.bmp;*.png)", 1+4)
$files = StringSplit($var, "|", 2)
For $i = 0 To UBound($files)-1
$file = $files[$i]
ConsoleWrite($file & #CRLF) ; Do something with file
Next
For me the results look like this:
C:\Users\Manadar\Desktop
skin1.png
skin2.png
So it's:
Directory of files
File1
File2
File3
etc.
Related
I have 2 lists with files with their md5sum checks and the lists have different paths for the same files.
Example of content in first file with check sums (server.list):
2c03ff18a643a1437ec0cf051b8b7b9d /tmp/fastq1_L001_R1_001.fastq.gz
c430f587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R2_001.fastq.gz/
6e6bcd84f264233cf7c428c0cfdc0c03 tmp/fastq1_L002_R1_001.fastq.gz
Example of content in two file with check sums (downloaded.list):
2c03ff18a643a1437ec0cf051b8b7b9d /home/projects/fastq1_L001_R1_001.fastq.gz
c430f587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R2_001.fastq.gz
6e6bcd84f264233cf7c428c0cfdc0c03 /home/projects/fastq1_L002_R1_001.fastq.gz
When I run the following line, I got the following lines:
awk -F"/" 'FNR==NR{filearray[$1]=$NF; next }!($1 in filearray){printf "%s has a different md5sum\n",$NF}' downloaded.list server.list
fastq1_L001_R1_001.fastq.gz has a different md5sum
fastq1_L001_R2_001.fastq.gz has a different md5sum
fastq1_L002_R2_001.fastq.gz has a different md5sum
Why I am getting this message since the first column is the same in both files? Can someone enlighten me on this issue?
Edit:
If I remove the path and leave only the file name, it works just fine.
Edit 2:
As pointed out, there is another possibility of file path form, which does not start with /. In this case, I cannot use / as the field separator.
Assumptions:
filename (sans path) and md5sum have to match
filenames may not be listed in the same order
filenames may not exist in both files
Sample data:
$ head downloaded.list server.list
==> downloaded.list <==
2c03ff18a643a1437ec0cf051b8b7b9d /home/projects/fastq1_L001_R1_001.fastq.gz # match
YYYYf587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R5_911.fastq.gz # different md5sum
c430f587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R2_001.fastq.gz # match
MNOPf587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R8_abc.fastq.gz # filename does not exist in other file
ABCDf587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R9_004.fastq.gz # different filename but matching md5sum (vs last line of other file)
==> server.list <==
2c03ff18a643a1437ec0cf051b8b7b9d /tmp/fastq1_L001_R1_001.fastq.gz # match
c430f587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R2_001.fastq.gz # match
XXXXf587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R5_911.fastq.gz # different md5sum
TUVWff18a643a1437ec0cf051b8b7b9d /tmp/fastq1_L999_R6_922.fastq.gz # filename does not exist in other file
ABCDf587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R7_933.fastq.gz # different filename but matching md5sum (vs last line of other file)
One awk idea to address white space issues as well as verifying filename matches:
awk ' # stick with default field delimiter of white space but ...
{ md5sum=$1
n=split($2,arr,"/") # split 2nd field on "/" delimiter
fname=arr[n]
if (FNR==NR)
filearray[fname]=md5sum
else {
if (fname in filearray && filearray[fname] == $1)
next
printf "%s has a different md5sum\n",fname
}
}
' downloaded.list server.list
This generates:
fastq1_L001_R5_911.fastq.gz has a different md5sum
fastq1_L999_R6_922.fastq.gz has a different md5sum
fastq1_L001_R7_933.fastq.gz has a different md5sum
The whitespace on $1 used as an array key is causing problems. Removing it:
awk -F"/" '{gsub(/ /, "", $1)}; FNR==NR{filearray[ $1]=$NF; next }!($1 in filearray){printf "%s has a different md5sum\n",$NF}' list1.txt list2.txt
I am trying to use this script to create a list of files names (including their path) and have each path separated by a comma so ideally, an output would look like: file1.txt,file2.txt,file3.txt ...etc. How do I go about this?
#!/bin/bash
LEFT=/home/ndevon/USER/SRA/PE/*_1.fastq.gz
for f in $LEFT; do
cat "${f}," >> /home/ndevon/USER/left_list.txt
done
What you want is probably
echo /home/ndevon/USER/SRA/PE/*_1.fastq.gz | tr ' ' ,
which translates spaces to commas. This works as long as your file names don't contain spaces.
Try this:
# read the filenames into an array
files=( /home/ndevon/USER/SRA/PE/*_1.fastq.gz )
# print the filenames comma-separated
IFS=,
echo "${files[*]}" > output_file
I have values in file A like
301310
304790
500011
600462
607348
614269
I want to search all these values in a text file B which has lines like this
1.35|10|5|11|1p36.31|GPR153|P|G protein-coupled receptor 153||614269|REc|||| | |4(Gpr153)|
1.36|3|24|06|1p36.31|HES2|P|Hairy/enhancer of split, Drosophila, homolog of, 2||609970|REc|||| | ||
1.37|3|24|06|1p36.31|HES3|P|Hairy/enhancer of split, Drosophila, homolog of, 3||609971|REc|||| | ||
1.38|3|24|06|1p36.33|HES4|P|Hairy/enhancer of split, Drosophila, homolog of, 4||608060|REc|||| | ||
1.39|3|24|06|1p36.32|HES5|P|Hairy/enhancer of split, Drosophila, homolog of, 5||607348|REc|||| | ||
I want to print all lines which contain any of the term from file A and print them in an output file.
I tried following command but it is not working
grep -w -F -f fileA fileB >file C
When you posted your question, you had your fileA like this:
301310
304790
500011
I edited the post for formatting, and removed the empty lines figuring it was just a editing typo on your part. If your input file actually does have empty lines in it, you'll want to remove them, as grep will interpret them as "match an empty pattern", which matches every line. It would equate to:
grep "" fileB
So, just wipe the empty lines from fileA and your posted grep command should work just fine.
If you're grep isn't working then maybe a perlish solution?
Although I will point out - given your data examples, no matches are found, which could be part of the problem.
#!/usr/bin/perl
use strict;
use warnings;
open ( my $file1, '<', "FileA" ) or die $!;
my #search_for = <$file1>;
close ( $file1 );
my $search_regex = join ( "|", map {quotemeta} #search_for );
$search_regex = qr/$search_regex/;
open ( my $file2, '<', "fileB" ) or die $!;
while ( <$file2> ) {
print if m/$search_regex/;
}
close ( $file2 );
If you want to ensure you don't have blank lines:
my $search_regex = join ( "|", grep { m/^\w+$/ } map {quotemeta} #search_for );
$search_regex = qr/$search_regex/;
The following command worked
grep -w -F -f fileA fileB > fileC
Initially it did not work because there was a blank line in fileA, that's why it was printing everything from fileB. Now I removed blank line from fileA and got my required lines.
What is the best way to remove all lines from a text file starting at first empty line in Bash? External tools (awk, sed...) can be used!
Example
1: ABC
2: DEF
3:
4: GHI
Line 3 and 4 should be removed and the remaining content should be saved in a new file.
With GNU sed:
sed '/^$/Q' "input_file.txt" > "output_file.txt"
With AWK:
$ awk '/^$/{exit} 1' test.txt > output.txt
Contents of output.txt
$ cat output.txt
ABC
DEF
Walkthrough: For lines that matches ^$ (start-of-line, end-of-line), exit (the whole script). For all lines, print the whole line -- of course, we won't get to this part after a line has made us exit.
Bet there are some more clever ways to do this, but here's one using bash's 'read' builtin. The question asks us to keep lines before the blank in one file and send lines after the blank to another file. You could send some of standard out one place and some another if you are willing to use 'exec' and reroute stdout mid-script, but I'm going to take a simpler approach and use a command line argument to let me know where the post-blank data should go:
#!/bin/bash
# script takes as argument the name of the file to send data once a blank line
# found
found_blank=0
while read stuff; do
if [ -z $stuff ] ; then
found_blank=1
fi
if [ $found_blank ] ; then
echo $stuff > $1
else
echo $stuff
fi
done
run it like this:
$ ./delete_from_empty.sh rest_of_stuff < demo
output is:
ABC
DEF
and 'rest_of_stuff' has
GHI
if you want the before-blank lines to go somewhere else besides stdout, simply redirect:
$ ./delete_from_empty.sh after_blank < input_file > before_blank
and you'll end up with two new files: after_blank and before_blank.
Perl version
perl -e '
open $fh, ">","stuff";
open $efh, ">", "rest_of_stuff";
while(<>){
if ($_ !~ /\w+/){
$fh=$efh;
}
print $fh $_;
}
' demo
This creates two output files and iterates over the demo data. When it hits a blank line, it flips the output from one file to the other.
Creates
stuff:
ABC
DEF
rest_of_stuff:
<blank line>
GHI
Another awk would be:
awk -vRS= '1;{exit}' file
By setting the record separator RS to be an empty string, we define the records as paragraphs separated by a sequence of empty lines. It is now easily to adapt this to select the nth block as:
awk -vRS= '(FNR==n){print;exit}' file
There is a problem with this method when processing files with a DOS line-ending (CRLF). There will be no empty lines as there will always be a CR in the line. But this problem applies to all presented methods.
I've looked everywhere and I'm out of luck.
I am trying to count the files in my current directory and all sub directories so that when I run the shell script count_files.sh it will produce a similar output to:
$
2 sh
4 html
1 css
2 noexts
(EDIT the above output should have each count and extension on a newline)
$
where noexts are either files without any period as an extension (ex: fileName ) or files with a period but no extension (ex: fileName. ).
this pipeline:
find * | awf -F . '{print $NF}'
gives me a comprehensive list of all the files, and I've figured out how to remove files without any period (ex: fileName ) using sed '/\//d'
MY ISSUE is that I cannot remove the files from the output of the above pipeline that are separated by a period but have NULL after the period (ex: fileName. ), as it is separated by the delimiter '.'
How can I use sed like above to remove a null character from a pipe input?
I understand this could be a quick fix, but I've been googling like a madman with no luck. Thanks in advance.
Chip
To filter filenames that end with ., since filenames are the whole input line in find's output, you could use
sed '/\.$/d'
Where \. matches a literal dot and $ matches the end of the line.
However, I think I'd do the whole thing in awk. Since sorting does not appear to be necessary:
EDIT: Found a nicer way to do it with awk and find's -printf action.
find . -type f -printf '%f\n' | awk -F. '!/\./ || $NF == "" { ++count["noext"]; next } { ++count[$NF] } END { for(k in count) { print k " " count[k] } }'
Here we pass -printf '%f\n' to find to make it print only the file name without the preceding directory, which makes it much easier to work with for our purposes -- this way there's no need to worry about periods in directory names (such as /etc/somethingorother.d). The field separator is '.', the awk code is
!/\./ || $NF == "" { # if the line (the filename) does not contain
# a period or there's nothing after the last .
++count["noext"] # increment the "noext" counter
# note that this will be collated with files that
# have ".noext" as filename extension. see below.
next # go to the next line
}
{ # in all other lines
++count[$NF] # increment the counter for the file extension
}
END { # in the very end:
for(k in count) { # print the counters.
print count[k] " " k
}
}
Note that this way, if there is a file "foo.noext", it will be counted among the files without a filename extension. If this is a worry, use a special counter for files without an extension -- either apart from the array or with a key that cannot be a filename extension (such as one that includes a . or the empty string).