Split and rename single file into multiple files using keywords present in file - linux

New to awk like commands. I have single text file holding SQL DDL's in below format.
DROP TABLE IF EXISTS $database.TABLE_A ;
...
...
DROP TABLE IF EXISTS $database.TABLE_B ;
...
...
Would like to split single file into multiple files as
TABLE_A.SQL
TABLE_B.SQL
TABLE_X.SQL
I am able to get the table names from single file with the help of below awk command. Still struggling to split and rename file with TABLE_X.SQL name.
awk 'FNR==1 {split($5,a,"."); print a[2]}' *.SQL
I am using Windows 10 DOS shell.

Finally I am able to acheive desired output with the help of below Shell script, which we can run in Windows bash shell ...
#!/bin/bash
#Split single file
awk '/DROP/{x="F"++i;}{print > x".TXT";}' $1
#Create output directory
mkdir -p ./_output
#Move file by chaning extention
for f in *.TXT ; do
newfilename=$(awk 'FNR==1 {split($5,a,"."); print a[2]}' "$f")
echo Processed $f ... new file is $newfilename".SQL" ...
mv $f ./_output/$newfilename".SQL"
done

Could you please try following.
awk '/DROP/{if(file){close(file)};match($0,/TABLE_[^ ]*/);file=substr($0,RSTART,RLENGTH)".SQL"} {print > (file)}' Input_file

awk -F "[. ]" '{print >($(NF-1)".SQL")}' file.sql

Related

Rename file while keeping the extension in Linux?

I have a directory that contains multiple files with different extensions (pdf, doc, txt...etc).
I'm trying to rename all files according to the directory name while keeping the file extension the same. The code below works fine if all files are PDF otherwise it will change txt file extension to pdf too.
How can I rename files while preserving the file extension
mv "$file" "${dir}/${dir}-${count}.pdf"
I assume you're doing this in some kind of loop? If so, you could grab the file extension first with
ext="${file##*.}" # eg. ext="txt", ext="pdf"...
And replace pdf with $ext in your mv command. Tested with sh, bash, dash, ksh.
you can do this through bash.
can you please provide more details. how your deciding this $dir and $count variable value.
if you already know by what you want to change the file name like below
OLD NAME|NEW NAME|Path
test.1|newtest.1|Path
arty.2|xyz.2|Path
if you want to replace it by specific names then you can prepare a list like above and then traverse through the file by while or for loop. below is simple bash snippet for case where you have files under multiple directory
while IFS="|" read OLD NEW PATH
do
cd $Path
filename=`echo $NEW|awk -F '.' '{print $1}'`
filetype=`echo $NEW|awk -F '.' '{print $2}'`
mv $OLD $filename.$filetype
done<FILE_PATH
if want to perform operation under single directory then below snippet will work.
for i in $(ls /tmp/temp)
do
filename=`echo $i|awk -F "." '{print $1}'`
fileType=`echo $i|awk -F "." '{print $2}'`
mv $i $filename.$fileType
done

Changing the file names and copying into different directory

I have some files say about 1000 numbers.. Wanted to rename those files in such a way that, wanted to cut out only few chars from file name and copy it to some other directory.
Ex: Original file name.
vfcon062562~19.xml
vfcon058794~29.xml
vfcon072009~3.xml
vfcon071992~10.xml
vfcon071986~2.xml
vfcon071339~4.xml
vfcon069979~43.xml
Required O/P is cutting the ~and following chars.
O/P Ex:
vfcon058794.xml
vfcon062562.xml
vfcon069979.xml
vfcon071339.xml
vfcon071986.xml
vfcon071992.xml
vfcon072009.xml
But want to place n different directory.
If you are using bash or similar you can use the following simple loop:
for input in vfcon*xml
do
mv $input targetDir/$(echo $input | awk -F~ '{print $1".xml"}')
done
Or in a single line:
for input in vfcon*xml; do mv $input targetDir/$(echo $input | awk -F~ '{print $1".xml"}'); done
This uses awk to separate everything before ~ using it as a field separator and printing the first column and appending ".xml" to create the output file name. All this is prepended with the targetDir which can be a full path.
If you are using csh / tcsh then the syntax of the loop will be slightly different but the commands will be the same.
I like to make sure that my data set is correct prior to changing anything so I would put that into a variable first and then check over it.
files=$(ls vfcon*xml)
echo $files | less
Then, like #Stefan said, use a loop:
for i in $files
do
mv "$i" "$( echo "$file" | sed 's/~[0-9].//g')"
done
If you need help with bash you can use http://www.shellcheck.net/

Rename part of file name based on exact match in contents of another file

I would like to rename a bunch of files by changing only one part of the file name and doing that based on an exact match in a list in another file. For example, if I have these file names:
sample_ACGTA.txt
sample_ACGTA.fq.abc
sample_ACGT.txt
sample_TTTTTC.tsv
sample_ACCCGGG.fq
sample_ACCCGGG.txt
otherfile.txt
and I want to find and replace based on these exact matches, which are found in another file called replacements.txt:
ACGT name1
TTTTTC longername12
ACCCGGG nam7
ACGTA another4
So that the desired resulting file names would be
sample_another4.txt
sample_another4.fq.abc
sample_name1.txt
sample_longername12.tsv
sample_nam7.fq
sample_nam7.txt
otherfile.txt
I do not want to change the contents. So far I have tried sed and mv based on my search results on this website. With sed I found out how to replace the contents of the file using my list:
while read from to; do
sed -i "s/$from/$to/" infile ;
done < replacements.txt,
and with mv I have found a way to rename files if there is one simple replacement:
for files in sample_*; do
mv "$files" "${files/ACGTA/another4}"
done
But how can I put them together to do what I would like?
Thank you for your help!
You can perfectly combine your for and while loops to only use mv:
while read from to ; do
for i in test* ; do
if [ "$i" != "${i/$from/$to}" ] ; then
mv $i ${i/$from/$to}
fi
done
done < replacements.txt
An alternative solution with sed could consist in using the e command that executes the result of a substitution (Use with caution! Try without the ending e first to print what commands would be executed).
Hence:
sed 's/\(\w\+\)\s\+\(\w\+\)/mv sample_\1\.txt sample_\2\.txt/e' replacements.txt
would parse your replacements.txt file and rename all your .txt files as desired.
We just have to add a loop to deal with the other extentions:
for j in .txt .bak .tsv .fq .fq.abc ; do
sed "s/\(\w\+\)\s\+\(\w\+\)/mv 'sample_\1$j' 'sample_\2$j'/e" replacements.txt
done
(Note that you should get error messages when it tries to rename non-existing files, for example when it tries to execute mv sample_ACGT.fq sample_name1.fq but file sample_ACGT.fq does not exist)
You could use awk to generate commands:
% awk '{print "for files in sample_*; do mv $files ${files/" $1 "/" $2 "}; done" }' replacements.txt
for files in sample_*; do mv $files ${files/ACGT/name1}; done
for files in sample_*; do mv $files ${files/TTTTTC/longername12}; done
for files in sample_*; do mv $files ${files/ACCCGGG/nam7}; done
for files in sample_*; do mv $files ${files/ACGTA/another4}; done
Then either copy/paste or pipe the output directly to your shell:
% awk '{print "for files in sample_*; do mv $files ${files/" $1 "/" $2 "}; done" }' replacements.txt | bash
If you want the longer match string to be used first, sort the replacements first:
% sort -r replacements.txt | awk '{print "for files in sample_*; do mv $files ${files/" $1 "/" $2 "}; done" }' | bash

passing awk variable to bash script

I am writing a bash/awk script to process hundreds of files under one directory. They all have name suffix of "localprefs". The purpose is to extract two values from each file (they are quoted by ""). I also want to use the same file name, but without the name suffix.
Here is what I did so far:
#!/bin/bash
for file in * # Traverse all the files in current directory.
read -r name < <(awk ' $name=substr(FILENAME,1,length(FILENAME)-10a) END {print name}' $file) #get the file name without suffix and pass to bash. PROBLEM TO SOLVE!
echo $name # verify if passing works.
do
awk 'BEGIN { FS = "\""} {print $2}' $file #this one works fine to extract two values I want.
done
exit 0
I could use
awk '{print substr(FILENAME,1,length(FILENAME)-10)}' to extract the file name without suffix, but I am stuck on how to pass that to bash as a variable which I will use as output file name (I read through all the posts on this subject here, but maybe I am dumb none of them works for me).
If anyone can shed a light on this, especially the line starts with "read", you are really appreciated.
Many thanks.
Try this one:
#!/bin/bash
dir="/path/to/directory"
for file in "$dir"/*localprefs; do
name=${file%localprefs} ## Or if it has a .: name=${file%.localprefs}
name=${name##*/} ## To exclude the dir part.
echo "$name"
awk 'BEGIN { FS = "\""} {print $2}' "$file" ## I think you could also use cut: cut -f 2 -d '"' "$file"
done
exit 0
To just take sbase name, you don't even need awk:
for file in * ; do
name="${file%.*}"
etc
done

Linux: Merging multiple files, each on a new line

I am using cat *.txt to merge multiple txt files into one, but I need each file to be on a separate line.
What is the best way to merge files with each file appearing on a new line?
just use awk
awk 'FNR==1{print ""}1' *.txt
If you have a paste that supports it,
paste --delimiter=\\n --serial *.txt
does a really great job
You can iterate through each file with a for loop:
for filename in *.txt; do
# each time through the loop, ${filename} will hold the name
# of the next *.txt file. You can then arbitrarily process
# each file
cat "${filename}"
echo
# You can add redirection after the done (which ends the
# for loop). Any output within the for loop will be sent to
# the redirection specified here
done > output_file
for file in *.txt
do
cat "$file"
echo
done > newfile
I'm assuming you want a line break between files.
for file in *.txt
do
cat "$file" >> result
echo >> result
done

Resources