How do I output the information from this script to a csv file ? anything helps.
Below is a script I wrote to that uses the stat command and the du command to output information on files in subdirectories for archiving purposes. The client wanted the output of this information which is the file name, the size of the file, the created, accessed, modified, and changed date of the file to determine which files they want to archive and which will stay put. Is there a way to send this output information to a csv file or excel file ?
echo "Please type in the directory"
read directory
for entry in "$directory"/*
do
modDate=$(stat -c %y "$entry") modDate1=$(stat -c %x "$entry") modDate2=$(stat -c %z "$entry") modDate3=$(du -sh "$entry"|awk '{print $1}') modDate4=$(stat -c %w "$entry") modDate5=$(stat -c %n "$entry")
modDate=${modDate%% *}
modDate1=${modDate1%% *}
modDate2=${modDate2%% *}
modDate4=${modDate4%% *}
echo File Name:$modDate5 Size:$modDate3 Created:$modDate4 Accessed:$modDate1 Modified:$modDate Changed:$modDate2
done
the output example of this script on terminal is stated as below, the output is what I would like to be transferred to a csv file. Thank you
EXECUTING SCRIPT
root#xizsdap1x:/home/o722823a# ./find.sh
Please type in the directory
READS
/sasdata/userdata01
OUTPUT
File Name:/sasdata/userdata01/db1 Size:153G Created:2018-04-10 Accessed:2022-12-14 Modified:2018-05-31 Changed:2018-05-31
File Name:/sasdata/userdata01/db2 Size:1.9T Created:2018-04-10 Accessed:2022-12-14 Modified:2020-04-21 Changed:2022-02-01
File Name:/sasdata/userdata01/db3 Size:819G Created:2018-04-10 Accessed:2022-12-14 Modified:2017-10-27 Changed:2018-11-13
File Name:/sasdata/userdata01/db4 Size:702G Created:2018-04-10 Accessed:2022-12-14 Modified:2018-04-03 Changed:2018-05-26
File Name:/sasdata/userdata01/dbqa Size:2.7T Created:2020-08-05 Accessed:2022-12-14 Modified:2021-05-22 Changed:2022-09-06
File Name:/sasdata/userdata01/proj22 Size:712G Created:2018-08-07 Accessed:2022-12-14 Modified:2018-04-03 Changed:2020-02-28
File Name:/sasdata/userdata01/proj26 Size:627G Created:2022-05-25 Accessed:2022-12-14 Modified:2022-06-03 Changed:2022-06-03
File Name:/sasdata/userdata01/proj39 Size:1.3T Created:2019-09-29 Accessed:2022-12-14 Modified:2019-12-11 Changed:2019-12-11
The following awk procedure can be used to create a .csv output from your data, taking advantage of the rigidly defined structure of the example data given.
for testing purposes, I have saved your data in a file named data.txt:
File Name:/sasdata/userdata01/db1 Size:153G Created:2018-04-10 Accessed:2022-12-14 Modified:2018-05-31 Changed:2018-05-31
File Name:/sasdata/userdata01/db2 Size:1.9T Created:2018-04-10 Accessed:2022-12-14 Modified:2020-04-21 Changed:2022-02-01
File Name:/sasdata/userdata01/db3 Size:819G Created:2018-04-10 Accessed:2022-12-14 Modified:2017-10-27 Changed:2018-11-13
File Name:/sasdata/userdata01/db4 Size:702G Created:2018-04-10 Accessed:2022-12-14 Modified:2018-04-03 Changed:2018-05-26
File Name:/sasdata/userdata01/dbqa Size:2.7T Created:2020-08-05 Accessed:2022-12-14 Modified:2021-05-22 Changed:2022-09-06
File Name:/sasdata/userdata01/proj22 Size:712G Created:2018-08-07 Accessed:2022-12-14 Modified:2018-04-03 Changed:2020-02-28
File Name:/sasdata/userdata01/proj26 Size:627G Created:2022-05-25 Accessed:2022-12-14 Modified:2022-06-03 Changed:2022-06-03
File Name:/sasdata/userdata01/proj39 Size:1.3T Created:2019-09-29 Accessed:2022-12-14 Modified:2019-12-11 Changed:2019-12-11
Processing the data with the following awk procedure:
awk 'BEGIN{FS="[: ]"} NR==1{print $1" "$2","$4","$6","$8","$10","$12} {print $3","$5","$7","$9","$11","$13} ' data.txt
results in the following output:
FileName,Size,Created,Accessed,Modified,Changed
/sasdata/userdata01/db1,153G,2018-04-10,2022-12-14,2018-05-31,2018-05-31
/sasdata/userdata01/db2,1.9T,2018-04-10,2022-12-14,2020-04-21,2022-02-01
/sasdata/userdata01/db3,819G,2018-04-10,2022-12-14,2017-10-27,2018-11-13
/sasdata/userdata01/db4,702G,2018-04-10,2022-12-14,2018-04-03,2018-05-26
/sasdata/userdata01/dbqa,2.7T,2020-08-05,2022-12-14,2021-05-22,2022-09-06
/sasdata/userdata01/proj22,712G,2018-08-07,2022-12-14,2018-04-03,2020-02-28
/sasdata/userdata01/proj26,627G,2022-05-25,2022-12-14,2022-06-03,2022-06-03
/sasdata/userdata01/proj39,1.3T,2019-09-29,2022-12-14,2019-12-11,2019-12-11
which can be redirected to a file in the usual way ( > data.csv tagged onto the awk procedure above.
Explanation
awk processes files line-by-line, separating each line (a record) into fields that, by default, are white-space-separated 'words' in the line. However, the field separator can be reset to split each line at other characters and, in modern versions of awk, can split at more than one character type.
The first part of the awk procedure uses a BEGIN block to reset the field separator to be either a space or a colon FS="[: ]" (the square brackets apply OR to all the characters within).
Next, a block with the conditional pattern /NR==1/ acts on just the first line of the input to extract and output the header values for the csv file. As the line has been split into fields at spaces and colons, "File Name" is fields 1 and 2 ($1 and $2), while "Size" is field 4 and so on.
The next block, having no preceding condition pattern, is applied to every row. It extracts and prints the values relating to each header field by using the appropriate field numbers.
The commas needed to separate the output are printed between the values (awk can automatically insert a defined output field separator but this is not appropriate here as we have to use both fields 1 and 2 of the input to make output field 1).
Caveat
The solution given is specific to your example data and should be considered 'fragile' - it will likely break if the structure of the data varies from that shown. It will work in modern version of awk provided that data structure is always used.
using the command
If the data is stored in a file, the file can be fed into the procedure as in the example above.
Alternatively, the output of whatever procedure produced the data can be pipped directly into awk like this:
your procedure | awk `...
tested in GNU Awk 5.1.0
Edit
alternative procedure for awk versions that don't process multiple field separators
This alternative uses the default field separator (white space) to extract fields, with a substitution being used to remove non-value parts of each field.
`sub(/^(.)*:/,"",$i)' takes field i and replaces all characters from the start, up to an including the colon, with an empty string, leaving the value part.
In this example, the header row has been stated explicitly rather than extracting header values from the file.
awk 'BEGIN {header="File Name, Size, Created, Accessed, Modified, Changed"; print header;} {row=""; for(i=2;i<NF;i++){sub(/^(.)*:/,"",$i); row=row $i", "} sub(/^(.)*:/,"",$NF); row=row $NF; print row}' data.txt
This alternative might help if the awk version can't recognise two delimeters at once.
According to the awk manual however, different awk implementations may interpret ^ in a regex differently.
output using GNU Awk 5.1.0
File Name, Size, Created, Accessed, Modified, Changed
/sasdata/userdata01/db1, 153G, 2018-04-10, 2022-12-14, 2018-05-31, 2018-05-31
/sasdata/userdata01/db2, 1.9T, 2018-04-10, 2022-12-14, 2020-04-21, 2022-02-01
/sasdata/userdata01/db3, 819G, 2018-04-10, 2022-12-14, 2017-10-27, 2018-11-13
/sasdata/userdata01/db4, 702G, 2018-04-10, 2022-12-14, 2018-04-03, 2018-05-26
/sasdata/userdata01/dbqa, 2.7T, 2020-08-05, 2022-12-14, 2021-05-22, 2022-09-06
/sasdata/userdata01/proj22, 712G, 2018-08-07, 2022-12-14, 2018-04-03, 2020-02-28
/sasdata/userdata01/proj26, 627G, 2022-05-25, 2022-12-14, 2022-06-03, 2022-06-03
/sasdata/userdata01/proj39, 1.3T, 2019-09-29, 2022-12-14, 2019-12-11, 2019-12-11
I have 336 txt files and each txt file has 4 columns. I need help to find string that are common or matched in a column 2 (Gene) in all txt files and extract that information in new txt file.
For example: how many times “kdpDE beta” present and if it is present then print ‘1’ in the next column of output txt file if “kdpDE beta” is absent then print ‘0’.
Thank you for your help.
File_1.txt
Name Gene Family Class
KB2908 kdpE beta aminoglycoside lactamase
KB2908 ugd peptide transferase
File_2.txt
Name Gene Family Class
KB2909 kdpE beta aminoglycoside lactamase
KB2909 ugd peptide transferase
KB2909 PmrF macrolide phosphotransferase
You can use grep with wc to get a count of a certain string within a file. You can loop through it with a script to do this for every file in a directory. The following will loop through the directory, count the number of times <search term> appears in each file, and output it to a file called output.txt.
for FILE in *; do
echo $FILE >> output.txt
grep -o -i '<search term>' $FILE | wc -l >> output.txt
echo >> output.txt
done
I have 500 wave files in a folder ABC which are named like
F1001
F1002
F1003
...
F1100
F2001
F2002
...
F2100
M3001
M3002
...
M3100
M4001
M4002
...
M4100
all with extension .wav.
Also I have a text file which contains 3 digit numbers like
001
003
098
034 .... (200 in total).
I want to select wave files from the folder ABC whose names end with these 3 digits.
Expecting MATLAB or bash script solutions.
I read this:
Copy or move files to another directory based on partial names in a text file. But I don't know how to use it for me.
for Matlab
1) Get all the file names in the folder using functions dir or rdir.
2) Using for loop go through every filename and add the last 3 digits of every filename to an array (array A). You will need str2num() here
3) Parse all 3 digit numbers to an array (array B)
4) Using function ismember(B, A) find which elements of B are contained in A
5) Load corresponding filenames
find . -name "*.wav" | grep -f <(awk '{print $0 ".wav"}' file)
grep -f will use the patterns stored in file, one per line, and look for them in your find result. But you want the three numbers to be at the end, so in the above command the last awk statement will provide a modified file with ".wav" appended at each line. So for the line 001, "0001.wav" will match but any file 0010.wav will not.
see: process substitution syntax and grep --help
function wavelist()
wavefiles=dir('*.wav'); % loaded wave files
myfolder='/home/adspr/Desktop/exps_sree/waves/selectedfiles'; %Output folder to store files
for i=1: numel(wavefiles) %for each wave file
filename=wavefiles(i).name;
[~,name,~] = fileparts(filename); % found the name of file without extension
a=name(end-2:end); %get the last 3 digits in the file name
fileID = fopen('nameslist','r');
while ~feof(fileID)
b=fgetl(fileID); % get each line in the list file
if strcmp(a,b) % compare
movefile(filename,myfolder); % moved to otput folder
end
end
fclose(fileID);
end
end
I don't think this is a simple answer, thats why I asked here. Anyway, my problem solved, thats why I posted this as an answer.
Thank you all.
I have 2 files at the moment, file A and file B. Certain lines in file A contain a substring of some line in file B. I would like to replace these substrings with the corresponding string in file B.
Example of file A:
#Name_1
foobar
info_for_foobar
evenmoreinfo_for_foobar
#Name_2
foobar2
info_for_foobar2
evenmoreinfo_for_foobar2
Example of file B:
#Name_1_Date_Place
#Name_2_Date_Place
The desired output I would like:
#Name_1_Date_Place
foobar
info_for_foobar
evenmoreinfo_for_foobar
#Name_2_Date_Place
foobar2
info_for_foobar2
evenmoreinfo_for_foobar2
What I have so far:
I was able to get the order of the names in File B corresponding to those in File A. So I was thinking to use a while loop here which goes through every line of file B, and then finds and replaces the corresponding substring in file A with that line, however I'm not sure how to put this into a bash script.
The code I have so far is, but which is not giving a desired output:
grep '#' fileA.txt > fileAname.txt
while read line
do
replace="$(grep '$line' fileB.txt)"
sed -i 's/'"$line"'/'"$replace"'/g' fileA.txt
done < fileAname.txt
Anybody has an idea?
Thanks in advance!
You can do it with this script:
awk 'NR==FNR {str[i]=$0;i++;next;} $1~"^#" {for (i in str) {if(match(str[i],$1)){print str[i];next;}}} {print $0}' B.txt A.txt
This awk should work for you:
awk 'FNR==NR{a[$0]; next} {for (i in a) if (index(i, $0)) $0=i} 1' fileB fileA
#Name_1_Date_Place
foobar
info_for_foobar
evenmoreinfo_for_foobar
#Name_2_Date_Place
foobar2
info_for_foobar2
evenmoreinfo_for_foobar2
Grep is unable to search contents of 1 file in the other file, Dont know what is wrong.
have 1 file called mine having contents like
sadiadas
HTTP:STC:ACTIVEX:MCAFEE-FREESCN
HTTP:STC:IMG:ANI-BLOCK-STR2
HTTP:STC:ADOBE:PDF-LIBTIFF
HTTP:STC:ADOBE:PS-PNG-BO
HTTP:STC:DL:EOT-IO
HTTP:STC:IE:CLIP-MEM
HTTP:STC:DL:XLS-DATA-INIT
HTTP:STC:ADOBE:FLASH-RUNTIME
HTTP:STC:ADOBE:FLASH-ARGREST
HTTP:STC:DL:MS-NET-CLILOADER-MC
HTTP:ORACLE:COREL-DRAW-BO
HTTP:STC:MS-FOREFRONT-RCE
HTTP:STC:DL:VISIO-UMLSTRING
HTTP:ORACLE:OUTSIDEIN-CORELDRAW
HTTP:STC:DL:MAL-M3U
HTTP:STC:JAVA:MIXERSEQ-OF
HTTP:STC:DL:MAL-WEBEX-WRF
HTTP:STC:DL:XLS-FORMULA-BIFF
HTTP:STC:JAVA:TYPE1-FONT
HTTP:STC:DL:XLS-FIELD-MC
HTTP:STC:IE:AUTH-REFLECTION
HTTP:STC:DL:MOZILLA-WAV-BOF
HTTP:XSS:PHPNUKE-BOOKMARKS1
HTTP:STC:DL:MAL-WIN-BRIEFCASE-2
HTTP:STC:ADOBE:FLASH-INT-OV
HTTP:STC:IE:MAL-GIF-DOS
APP:NOVELL:GWMGR-INFODISC
APP:SYMC:MESSAGING-SAVE.DO-CSRF
HTTP:STC:ADOBE:READER-MC-RCE
HTTP:STC:DL:SOPHOS-RAR-VMSF-RGB
HTTP:ORACLE:OUTSIDE-IN-PRDOX-BO
HTTP:STC:JAVA:IBM-RMI-PROXY-RCE
HTTP:STC:IE:REMOVECHILD-UAF
HTTP:STC:COREL-WP-BOF
SHELLCODE:MSF:PROPSPRAY
HTTP:VLC-ABC-FILE-BOF
HTTP:MISC:MS-XML-SIG-VAL-DOS
HTTP:STC:ADOBE:FLASH-PLAYER-BOF
HTTP:STC:ADOBE:FLASHPLR-FILE-MC
HTTP:STC:ADOBE:FLASH-AS3-INT-OV
HTTP:ORACLE:OUTSIDE-IN-MSACCESS
HTTP:STC:SCRIPT:APACHE-XML-DOS
HTTP:STC:JAVA:METHODHANDLE
HTTP:STC:ADOBE:CVE-2014-0506-UF
HTTP:STC:IE:CVE-2014-1789-MC
HTTP:STC:ACTIVEX:KVIEW-KCHARTXY
SHELLCODE:X86:LIN-SHELL-REV-80S
HTTP:STC:JAVA:JRE-PTR-CTRL-EXEC
HTTP:STC:ADOBE:CVE-2015-0091-CE
HTTP:DOS:MUL-PRODUCTS
HTTP:MISC:WAPP-SUSP-FILEUL1
SHELLCODE:X86:BASE64-NOOP-80C
SHELLCODE:X86:BASE64-NOOP-80S
SHELLCODE:X86:REVERS-CONECT-80C
SHELLCODE:X86:REVERS-CONECT-80S
SHELLCODE:X86:FLDZ-GET-EIP-80C
SHELLCODE:X86:FLDZ-GET-EIP-80S
SHELLCODE:X86:WIN32-ENUM-80C
SHELLCODE:X86:WIN32-ENUM-80S
and another file that has some of the contents of file 1 called 2537_2550
HTTP:STC:OUTLOOK:MAILTO-QUOT-CE
HTTP:STC:HSC:HCP-QUOTE-SCRIPT
HTTP:STC:HSC:MS-HSC-URL-VLN
HTTP:STC:TELNET-URL-OPTS
HTTP:STC:NOTES-INI
HTTP:STC:MOZILLA:SHELL
HTTP:STC:RESIZE-DOS
HTTP:STC:IE:SHELL-WEB-FOLDER
HTTP:STC:IE:IE-MHT-REDIRECT
HTTP:IIS:ASP-DOT-NET-BACKSLASH
APP:SECURECRT-CONF
HTTP:STC:IE:IE-FTP-CMD
HTTP:STC:IE:URL-HIDING-ENC
HTTP:STC:MOZILLA:IFRAME-SRC
HTTP:STC:JAVA:MAL-JNLP-FILE
HTTP:STC:MOZILLA:WRAPPED-JAVA
HTTP:STC:MOZILLA:ICONURL-JS
APP:REAL:PLAYER-FORMAT-STRING
HTTP:STC:IE:FULLMEM-RELOAD
HTTP:STC:DL:PPT-SCRIPT
HTTP:STC:MOZILLA:FIREUNICODE
HTTP:STC:IE:MULTI-ACTION
HTTP:STC:IE:CREATETEXTRANGE
HTTP:STC:IE:HTML-TAG-MC
HTTP:STC:IE:NESTED-OBJECT-TAG
SHELLCODE:JS:UNICODE-ENC
HTTP:STC:IE:UTF8-DECODE-OF
HTTP:STC:IE:VML-FILL-BOF
HTTP:STC:MOZILLA:FF-DEL-OBJ-REF
HTTP:STC:ADOBE:ACROBAT-URL-DF
HTTP:STC:CLSID:ACTIVEX:TREND-AX
HTTP:XSS:IE7-XSS
HTTP:STC:NAV-REDIR
HTTP:STC:ACTIVEX:AOL-AMPX
HTTP:STC:ACTIVEX:IENIPP
HTTP:STC:ACTIVEX:REAL-PLAYER
HTTP:STC:ACTIVEX:ORBIT-DWNLDR
HTTP:STC:SEARCH-LINK
HTTP:STC:ITUNES-HANDLER-OF
HTTP:STC:OPERA:FILE-URL-OF
HTTP:STC:ACTIVEX:EASYMAIL
HTTP:STC:ACTIVEX:IETAB-AX
HTTP:STC:ADOBE:PDF-LIBTIFF
HTTP:STC:IE:TOSTATIC-DISC
HTTP:STC:WHSC-RCE
HTTP:STC:IE:CROSS-DOMAIN-INFO
HTTP:STC:IE:UNISCRIBE-FNPS-MC
HTTP:STC:IE:CSS-OF
HTTP:STC:OBJ-FILE-BASE64
HTTP:STC:IE:ANIMATEMOTION
HTTP:STC:CHROME:GURL-XO-BYPASS
HTTP:STC:SAFARI:WEBKIT-1ST-LTR
HTTP:STC:IE:BOUNDELEMENTS
HTTP:STC:IE:IFRAME-MEM-CORR
HTTP:STC:STREAM:QT-HREFTRACK
HTTP:STC:MOZILLA:CONSTRUCTFRAME
HTTP:STC:MOZILLA:ARGMNT-FUNC-CE
HTTP:STC:ADOBE:PS-PNG-BO
HTTP:STC:IE:HTML-RELOAD-CORRUPT
HTTP:STC:IE:TABLE-SPAN-CORRUPT
HTTP:STC:IE:TABLE-LAYOUT
HTTP:STC:DL:MSHTML-DBLFREE
HTTP:STC:IE:EVENT-INVOKE
HTTP:STC:IE:DEREF-OBJ-ACCESS
HTTP:STC:IE:TOSTATIC-XSS
HTTP:STC:ON-BEFORE-UNLOAD
HTTP:STC:DL:MAL-WOFF
HTTP:STC:DL:EOT-IO
HTTP:STC:MOZILLA:FF-REMOTE-MC
HTTP:STC:DL:DIRECTX-SAMI
HTTP:STC:IE:ONREADYSTATE
HTTP:STC:DL:VML-GRADIENT
HTTP:STC:IE:TABLES-MEMCORRUPT
HTTP:STC:JAVA:DOCBASE-BOF
HTTP:STC:IE:CLIP-MEM
HTTP:STC:ACTIVEX:WMI-ADMIN
HTTP:STC:MOZILLA:DOC-WRITE-MC
HTTP:STC:IE:SELECT-ELEMENT
HTTP:STC:IE:XML-ELEMENT-RCE
SHELLCODE:X86:FNSTENV-80C
HTTP:STC:IE:OBJ-MGMT-MC
HTTP:STC:DL:XLS-DATA-INIT
HTTP:STC:ADOBE:FLASH-RUNTIME
HTTP:STC:ACTIVEX:ISSYMBOL
HTTP:STC:ADOBE:FLASH-ARGREST
HTTP:STC:IE:VML-RCE
HTTP:STC:IE:HTML-TIME
HTTP:STC:IE:LAYOUT-GRID
HTTP:STC:IE:CELEMENT-RCE
HTTP:STC:IE:SELECT-EMPTY
HTTP:XSS:MS-IE-TOSTATICHTML
HTTP:STC:SAFARI:WEBKIT-FREE-CE
HTTP:IIS:ASP-PAGE-BOF
HTTP:STC:MOZILLA:FIREFOX-MC
HTTP:STC:MOZILLA:FF-XSL-TRANS
HTTP:STC:DL:MS-NET-CLILOADER-MC
HTTP:STC:MOZILLA:CLEARTEXTRUN
HTTP:STC:MOZILLA:FIREFOX-ENG-MC
HTTP:STC:MOZILLA:PARAM-OF
HTTP:ORACLE:COREL-DRAW-BO
HTTP:STC:MOZILLA:JIT-ESCAPE-MC
HTTP:STC:SAFARI:WEBKIT-SVG-MC
HTTP:STC:SAFARI:INNERHTML-MC
HTTP:STC:MOZILLA:NSCSSVALUE-OF
HTTP:NOVELL:GROUPWISE-IMG-BOF
I tried
grep -Ff mine 2537_2550 but the grep wasn't able to search?
Using exactly your input and your command I'm able to find the matching lines:
$ grep -Ff file1 file2
HTTP:STC:ADOBE:PDF-LIBTIFF
HTTP:STC:ADOBE:PS-PNG-BO
HTTP:STC:DL:EOT-IO
HTTP:STC:IE:CLIP-MEM
HTTP:STC:DL:XLS-DATA-INIT
HTTP:STC:ADOBE:FLASH-RUNTIME
HTTP:STC:ADOBE:FLASH-ARGREST
HTTP:STC:DL:MS-NET-CLILOADER-MC
HTTP:ORACLE:COREL-DRAW-BO
Probably you have some non-printable character that prevents you from finding the matches.
Try to remove non printable characters from both your files with the following command:
tr -cd '\11\12\15\40-\176' < infile > outfile
I have used the input data you have mentioned and it is working .
Following output is given
$ grep -Ff pattern searchFile
HTTP:STC:ADOBE:PDF-LIBTIFF
HTTP:STC:ADOBE:PS-PNG-BO
HTTP:STC:DL:EOT-IO
HTTP:STC:IE:CLIP-MEM
HTTP:STC:DL:XLS-DATA-INIT
HTTP:STC:ADOBE:FLASH-RUNTIME
HTTP:STC:ADOBE:FLASH-ARGREST
HTTP:STC:DL:MS-NET-CLILOADER-MC
HTTP:ORACLE:COREL-DRAW-BO
Probably there is some non-printable characters in your file .
use cat -vte filename to look for them.
In case your file have been ftped from some different OS server like windows , use dos2unix filename to convert it into unix file format