How to read file from another file - linux
This script lists the unit-*-slides.txt files in from directory to a filelist.txt file and from that file list it goes to the file and reads the file and gives the count of st^ lines to a file.but it is not counting in order for ex 1,2,3,4,.... it is counting like 10,1,2,3,4......
How to read it in order.
#!/bin/sh
#
outputdir=filelist
mk=$(mkdir $outputdir)
$mk
dest=$outputdir
cfile=filelist.txt
ofile="combine-slide.txt"
output=file-list.txt
path=/home/user/Desktop/script
ls $path/unit-*-slides.txt | sort -n -t '-' -k 2 > $dest/$cfile
echo "Generating files list..."
echo "Done"
#Combining
while IFS= read file
do
if [ -f "$file" ]; then
tabs=$(cat unit-*-slides.txt | grep "st^" | split -l 200)
fi
done < "$dest/$cfile"
echo "Combining Done........!"
Try with sort -n
tabs=$(cat $( ls unit-*-slides.txt | sort -n ) | grep "st^" | split -l 200)
sort -n means numeric sort, so output of ls is ordered by number.
Related
Bash script to automatically email variable filename and corresponding modified datetime stamp
In short, i am trying to find the duplicate files ( which have been downloaded from FTP site, it is another script file) and email the business about all the duplicate files and its corresponding modified dates. The file /path/test_mail.txt contains one file name per line (two filenames in this case), ex. abc.xlsx def.xlsx In the below code, i am trying to find the modified datetimestamp for the first filename and pipe it with the respective filename and send an email , similarly the loop runs for the second one. This is using stat for val in '/path/test_mail.txt'; do { stat path/$val | grep 'Modify: ' | cut -d' ' -f2,3,4 | awk -F"." '{print $1}' ; } | $val done | mail -s "Duplicate file found ${DATE}" abc#xyz.com I also tried in another way using ls -ltr for val in '/path/tj_mail.txt'; do { ls -ltr /path/$val | cut -d' ' -f6,7,8 | find $val / -path $val done | mail -s "Duplicate file found ${DATE}" abc#xyz.com i was expecting the email body should be approximately like Duplicate Filename - xyz.xlsx Uploaded time - 2020-02-17 11:18:10 Duplicate Filename - abc.xlsx Uploaded time - 2020-02-17 11:18:10 The below question is optional, but that would be great if you can help me! Also i am using another script to find the duplicate filenames in the directory. it works perfectly fine. But i am wondering if i could fit in the same code above in 1 single script file, so that it would be crisp and easy! { DATE=`date +"%Y-%m-%d"` dirname=/path tempfile=myTempfileName find $dirname -type f > $tempfile cat $tempfile | sed 's_.*/__' | sort | uniq -d| while read fileName do grep "$fileName" $tempfile done } | tee '/path/tj_var.txt' | awk -F"/" '{print $NF}' | tee '/path/tj_var.txt' | sort -u | tee '/path/tj_mail.txt' | mail -s "Duplicate file found ${DATE}" abc#xyz.com This is my actual code path = /marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation printf "%s" "$(</marketsource/scripts/tj_mail.txt)" | while IFS= read -r filename; do mtime=$(stat -c %y "/path/$filename") printf 'Duplicate Filename - %s Uploaded time - %s\n' "$filename" "$mtime" done | mail -s "Duplicate file found ${DATE}" tipalli#allegisgroup.com mtime=$(stat -c %y "/path/$filename" 2>/dev/null || echo "unknown (stat failed)") this is the error! ./tj_mail1.ksh: line 1: path: command not found stat: cannot stat `/path/AirTimeActs_2020-02-08.xlsx': No such file or directory Little more!! My aim to find for any duplicates files, if therent arent any duplicate files and find command is empty, then perform the if condition and perform 'mv' command and exit the script entirely, if they are duplicate files, then exit the if condition and pipe the duplicate files and perform the mail and date stamp operation. { DATE=`date +"%Y-%m-%d"` dirname=/marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation tempfile=myTempfileName find $dirname -type f > $tempfile cat $tempfile | sed 's_.*/__' | sort | uniq -d| while read fileName do grep "$fileName" $tempfile done } if ["$fileName" == ""]; then mv /marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/*.xlsx /marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/Archive mv /marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/*.csv /marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/Archive exit 1 fi | tee '/marketsource/scripts/tj_var.txt' | awk -F"/" '{print $NF}' | tee '/marketsource/scripts/tj_var.txt' | sort -u | tee '/marketsource/scripts/tj_mail.txt' DATE=`date +"%Y-%m-%d"` printf "%s\n" "$(</marketsource/scripts/tj_mail.txt)" | while IFS= read -r filename; do mtime=$(stat -c %y "/marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/$filename") printf 'Duplicate Filename - %s Uploaded time - %s\n\n' "$filename" "$mtime" done | mail -s "Duplicate file found ${DATE}" ti#allegisgroup.com
I assume the file /path/test_mail.txt has been prepared by some other script (as a list of duplicate files) and the task is to add the modification time of the files listed in /path/test_mail.txt and format the output as shown in the question. while IFS= read -r filename; do mtime=$(stat -c %y "/path/$filename") printf 'Duplicate Filename - %s Uploaded time - %s\n' "$filename" "$mtime" done < "/path/test_mail.txt" Instead of parsing the file /path/test_mail.txt you could also add this to a pipe like this somehow_print_duplicate_file_names | while IFS= read -r filename; do mtime=$(stat -c %y "/path/$filename") printf 'Duplicate Filename - %s Uploaded time - %s\n' "$filename" "$mtime" done | somehow_send_mail You could add some error handling in case stat fails. mtime=$(stat -c %y "/path/$filename" 2>/dev/null || echo "unknown (stat failed)") or use stat's error message mtime=$(stat -c %y "/path/$filename" 2>&1)
Fastest way to compare hundreds of thousands of files, and create output results file in bash
I have the following: -Values File, values.txt -Directory Structure: ./dataset/label/author/files.txt -Tens of thousands of files.txt's -A file called targets.txt, which contains the location of every files.txt Example targets.txt ./dataset/tallperson/Jabba/awesome.txt ./dataset/fatperson/Detox/toxic.txt I have a file called values.txt, which contains hundreds of thousands of lines of values. These values are things like "aef", "; i", "jfk", etc. Random 3-Character lines. I also have tens of thousands of files, each which also contain hundreds to thousands of lines. Each line also contains Random 3-Character lines. The values.txt was created using the values of each files.txt. Therefore, there is no value in any file.txt file which isn't contained in values.txt. values.txt contains NO repeating values. Example: ./dataset/weirdperson/Crooked/file1.txt LOL hel lo how are you on thi s f ine day ./dataset/awesomeperson/Mild/file2.txt I a m v ery goo d. Tha nks LOL values.txt are you on thi s f ine day goo d. Tha hel lo how I a m v ery nks LOL The above is just example data. Each file will contain hundreds of lines. And values.txt will contain hundreds of thousands of lines. My goal here is to make one file, where each line is a file. Each line will contain N values where each value is correspondant to the line in values.txt. And each value will be seperated by a comma. Each value is calculated simply by how many times each file contains the value of each line in values.txt. The result should look something like this. With line 1 being file1.txt and line 2 being file2.txt. Result.txt 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1, 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1, Now. The last thing is, after getting this result I would like to add a label. The label is equivalent to the Nth parent directory from the file. For this example, lets say the 2nd parent directory. Therefore the label would be "tallperson" or "shortperson". As a result, the new Results.txt file would look like this. Results.txt 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson I would like a way to accomplish all of this, but I need it to be fast as I am working with a very large scale dataset. This is my current code, but it's too slow. The bottleneck is line 2. Script. Each file located at "./dataset/label/author/file.java" 1 while IFS= read file_name; do 2 cat values.txt | xargs -d '\n' -I {} grep -Fc -- "{}" "$file_name" | xargs printf "%d," >> Results.txt; 3 label=$(echo "$file_name" | cut -d '/' -f 3); 4 printf "$label\n" >> Results.txt; 5 done < targets.txt ------------ To REPLICATE this problem. Do the following: mkdir -p dataset/{label1,label2} touch file1.txt; chmod 777 file1.txt touch file2.txt; chmod 777 file2.txt echo "Enter anything here" > file1.txt echo "Enter something here too" > file2.txt mv file1.txt ./dataset/label1 mv file2.txt ./dataset/label2 find ./dataset/ -type f -name "*.txt" | while IFS= read file_name; do cat $file_name | sed -e "s/.\{3\}/&\n/g" | sort -u > $modified-file_name; done find ./dataset/ -type f -name "modified-*.txt" | xargs -d '\n' -I {} echo {} >> targets.txt xargs cat < targets.txt | sort -u > values.txt With the above UNCHANGED, you should get a values.txt with something similar to below. If there's any lines with less or more than 3 characters for some reason, please delete the line. any e Ent er eth he her ing ng re som thi too You should get a targets.txt file ./dataset/label2/modified-file2.txt ./dataset/label1/modified-file1.txt From here. The goal is to check every file in targets.txt, and count how many values the file has contained in values.txt. And to output the results with the label to Results.txt The following script will work for this example, but I need it to be way faster for large scale operations. while IFS= read file_name; do cat values.txt | xargs -d '\n' -I {} grep -Fc -- "{}" $file_name | xargs printf "%d," >> Results.txt; label=$(echo "$file_name" | cut -d '/' -f 3); printf "$label\n" >> Results.txt; done < targets.txt Here's another example Example 2: ./dataset/weirdperson/Crooked/file1.txt LOL LOL HAHA ./dataset/awesomeperson/Mild/file2.txt LOL LOL LOL values.txt LOL HAHA Result.txt 2,1,weirdperson 3,0,awesomeperson
Here's a solution in Python, using its ordered dictionary datatype. import os from collections import OrderedDict # read samples from values.txt into an Ordered Dict. # each dict key is a line from the file # (including the trailing newline, but that doesn't matter) # each dict value is 0 with open('values.txt', 'r') as f: samplecount0=OrderedDict((sample, 0) for sample in f.readlines()) # get list of filenames from targets.txt with open('targets.txt', 'r') as f: targets=[t.rstrip('\n') for t in f.readlines()] # for each target, # read its lines of samples # increment the corresponding count in samplecount # print out samplecount in a single line separated by commas # each line also has the 2nd-to-last directory component of the target's pathname for target in targets: with open(target, 'r') as f: # copy samplecount0 to samplecount so we don't have to read the values.txt file again samplecount=samplecount0.copy() # for each sample in the target file, increment the samplecount dict entry for tsample in f.readlines(): samplecount[tsample] += 1 output = ','.join(str(v) for v in samplecount.values()) output += ',' + os.path.basename(os.path.dirname(os.path.dirname(target))) print(output) Output: $ python3 doit.py 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson
Try this: <targets.txt xargs -n1 -P4 bash -c " awk 'NR==FNR{a[\$0];next} {if (\$0 in a) {printf \"1,\"} else {printf \"0,\"}}' \"\$1\" values.txt | sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01' " -- The -P4 let's you parallelize the jobs in targets.txt. The short awk script marges lines and prints 0 and 1 followed by a comma. Then sed is used to append the 3rd part of the folder path to the end of the line. The sed line looks strange, because I used unprintable character $'\x01' as the separator for s command. Tested with: mkdir -p ./dataset/weirdperson/Crooked cat <<EOF >./dataset/weirdperson/Crooked/file1.txt LOL hel lo how are you on thi s f ine day EOF mkdir -p ./dataset/awesomeperson/Mild/ cat <<EOF >./dataset/awesomeperson/Mild/file2.txt I a m v ery goo d. Tha nks LOL EOF cat <<EOF >values.txt are you on thi s f ine day goo d. Tha hel lo how I a m v ery nks LOL EOF cat <<EOF >targets.txt ./dataset/weirdperson/Crooked/file1.txt ./dataset/awesomeperson/Mild/file2.txt EOF measure_start() { declare -g ttic_start echo "==> Test $* <==" ttic_start=$(date +%s.%N) } measure_end() { local end end=$(date +%s.%N) local start start="$ttic_start" ttic_runtime=$(python -c "print(${end} - ${start})") echo "Runtime: $ttic_runtime" echo } measure_start original while IFS= read file_name; do cat values.txt | xargs -d '\n' -I {} grep -Fc -- "{}" $file_name | xargs printf "%d," label=$(echo "$file_name" | cut -d '/' -f 3); printf "$label\n" done < targets.txt measure_end measure_start first try with bash nl -w1 values.txt | sort -k2.2 > values_sorted.txt < targets.txt xargs -n1 -P0 bash -c " sort -t$'\t' \"\$1\" | join -t$'\t' -12 -21 -eEMPTY -a1 -o1.1,2.1 values_sorted.txt - | sort -s -n -k1.1 | sed 's/.*\tEMPTY/0/;t;s/.*/1/' | tr '\n' ',' | sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01' " -- measure_end measure_start second try with awk <targets.txt xargs -n1 -P0 bash -c " awk 'NR==FNR{a[\$0];next} {if (\$0 in a) {printf \"1,\"} else {printf \"0,\"}}' \"\$1\" values.txt | sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01' " -- measure_end Outputs: ==> Test original <== 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson Runtime: 0.133769512177 ==> Test first try with bash <== 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson Runtime: 0.0322473049164 ==> Test second try with awk <== 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson Runtime: 0.0180222988129
Bash substring from position not printing
I am using the following format #{string:start:length} to extract the file name from wget's .listing file, line by line. The format for the file is something I think we are all familiar with: 04-30-13 01:41AM 7033614 some_archive.zip 04-29-13 08:13PM <DIR> DIRECTORY NAME 1 04-29-13 05:41PM <DIR> DIRECTORY NAME 2 All file names start at pos:40, so setting :start to 39, with no :length should (and does) return the file name for each line: #!/bin/bash cat .listing | while read line; do file="${line:40}" echo $file done Correctly returns: some_archive.zip DIRECTORY NAME 1 DIRECTORY NAME 2 However, if I get any more creative, it breaks: #!/bin/bash cat .listing | while read line; do file="${line:40}" dir=$(echo $line | egrep -o '<DIR>' | head -n1) if [ $dir ]; then echo "the file $file is a $dir" fi done Returns: $ ./test.sh is a <DIR>ECTORY NAME 1 is a <DIR>ECTORY NAME 2 What gives? I lose "the file " and the rest of the test looks like it prints on top of "the file DIRECTORY NAME 1" from pos:0. It's weird, what's it on account of?
The answer, as I am learning more and more with linux as I progress, is non-printing control characters. Adding a pipe to egrep for only printing characters solved the problem: #!/bin/bash cat .listing | while read line; do file=$(echo ${line:39} | egrep -o '[[:print:]]+' | head -n1) dir=$(echo $line | egrep -o '<DIR>' | head -n1) if [ $dir ]; then echo "the file $file is a $dir" fi done Correctly returns: $ ./test.sh the file DIRECTORY NAME 1 is a <DIR> the file DIRECTORY NAME 2 is a <DIR> Wish there were a better way to visualize these control characters, but what the above does is basically take the string segment, pull out the first string of printable characters, and assign it to the variable. I assume there is a control character at the end of the line that returns the cursor to the beginning of the line. Causing the rest of the echo to be printed there, overwriting the previous characters.' Odd.
You can remove the \r control characters from the whole file by using the tr command on the first line of your script: #!/bin/bash cat .listing | tr -d '\015' | while read line; do file="${line:39}" dir=$(echo $line | egrep -o '<DIR>' | head -n1) if [ $dir ]; then echo "the file $file is a $dir" fi done
In Linux, how can I merge the output of two console commands into a text file?
I'm obtaining the first line and last 10,000 lines of a csv as follows: head workrace.csv -n 1 tail workrace.csv -n 10000 How can I merge the output into a single text file? I can pipe the above commands into two separate text files and then concatenate the files. Is there a way to do this without needing to use intermediary text files?
You can either run both commands in a subshell: ( head workrace.csv -n 1 ; tail workrace.csv -n 10000 ) > result.txt or, you can use the >> redirection operator to add contents to a file: head workrace.csv -n 1 > result.txt tail workrace.csv -n 10000 >> result.txt
Some other options not mentioned by choroba: F=workrace.c { head -n 1 $F; tail -n 10000 $F; } > result.txt # no subshell awk 'NR==1 || NR>k-1000' k="$( wc -l < $F )" $F > result.txt exec > result.txt # truncate result.txt and direct output of remaining commands to it head -n 1 $F tail -n 10000 $F
File name printed twice when using wc
For printing number of lines in all ".txt" files of current folder, I am using following script: for f in *.txt; do l="$(wc -l "$f")"; echo "$f" has "$l" lines; done But in output I am getting: lol.txt has 2 lol.txt lines Why is lol.txt printed twice (especially after 2)? I guess there is some sort of stream flush required, but I dont know how to achieve that in this case.So what changes should i make in the script to get the output as : lol.txt has 2 lines
You can remove the filename with 'cut': for f in *.txt; do l="$(wc -l "$f" | cut -f1 -d' ')"; echo "$f" has "$l" lines; done
The filename is printed twice, because wc -l "$f" also prints the filename after the number of lines. Try changing it to cat "$f" | wc -l.
wc prints the filename, so you could just write the script as: ls *.txt | while read f; do wc -l "$f"; done or, if you really want the verbose output, try ls *.txt | while read f; do wc -l "$f" | awk '{print $2, "has", $1, "lines"}'; done
There is a trick here. Get wc to read stdin and it won't print a file name: for f in *.txt; do l=$(wc -l < "$f") echo "$f" has "$l" lines done