Changing file type for multiple files at once using awk command - linux

I am trying to change .dat files to .csv files using the awk command. An example file has 3 columns of numbers with spaces between each column:
23.00005 320.0054 0.0039734
xx.xxxxx xxx.xxxx x.xxxxxxx
The filenames are organized as filenameX.project.dat where X is any number from 1 to a couple hundred. The folder has many other files that I do not want changed. I want to be able to change all of these files at once instead of having to do them over and over.
Here is my example command:
awk '{print $1","$2","$3}' filenameX.project.dat > filenameX.project.csv
How can I automate this to run one command that will make every file a csv file?
I have tried the below command and others similar but none work.
awk '{print $1","$2","$3}' filename*.project.dat > filename*.project.csv

Something like this:
$ for i in filename*dat; do awk '{print $1","$2","$3}' "$i" >> $(echo "$i" | sed 's,\.dat$,.csv,'); done
It will loop through all filename*dat files in a directory, execute awk command on them and redirect output to the file that has .csv instead of .dat at the end.

You can do this all in awk like so:
awk 'BEGIN {OFS=","}
FNR==1 {fn=FILENAME; sub(/\.dat$/,".csv",fn)
printf "Copying %s to %s\n", FILENAME, fn}
{ for (i=1;i<=NF;i++) printf "%s%s", $i, i<NF ? OFS : RS > fn}' *.dat

Please make a backup first, as I am still not certain what you mean, but suspect it is:
rename -n -S .dat .csv filename*.project.dat
If it looks good, remove the -n and run again for real.

Related

linux create directories and move corresponding files to the directories

I have a text file listed the directory names and what files should included inside.
my text file:
SRS000111 ERR1045156
SRS000112 ERR1045188
SRS000123 ERR1045204
SRS000134 ERR1045237 ERR1045238 ERR1045239
SRS000154 ERR1045255 ERR1045256
SRS000168 ERR1045260 ERR1045261 ERR1045262
... ... ...
SRS001567 ERR1547451 ERR1547676
now I want to create all the directories using the first column of the text file but I don't know how to do the for loop.
for filename in cat file.txt | awk -F, '{print $1}'; do mkdir ${filename}; done
but it goes to error.
second I have all the ERR files and I want to move them to the corresponding directories according to the text file. I have not any idea how to do this part.
I recommend you read the file, split the folder name columns and file name columns and makes de directories and the movements.
This script makes it:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
dir=$(echo "$line" | awk '{print $1}')
files=$(echo "$line" | awk '{$1=""; print $0}')
mkdir $dir
mv $files $dir/
done < myfile.txt
Is not to complicated but if you have questions about it you can make me any question
You have to make system calls from awk for mkdir and mv files
This awk would do
awk 'FNR>1{system("mkdir \"" $1 "\""); for(i=2; i<=NF; i++) system("mv \"" $i "\" " "\"" $1 "\"")}' file
FNR>1 because we don't want to create directory for first line i.e header names in your CSV file
Note : Run this command from the directory where all the filenames as mentioned in your source/input file are present. This will create directories there itself and will move all the files in those newly created directories.

Concatenation of huge number of selective files from a directory in Shell

I have more than 50000 files in a directory such as file1.txt, file2.txt, ....., file50000.txt. I would like to concatenate of some files whose file numbers are listed in the following text file (need.txt).
need.txt
1
4
35
45
71
.
.
.
I tried with the following. Though it is working, but I look for more simpler and short way.
n1=1
n2=$(wc -l < need.txt)
while [ $n1 -le $n2 ]
do
f1=$(awk 'NR=="$n1" {print $1}' need.txt)
cat file$f1.txt >> out.txt
(( n1++ ))
done
This might also work for you:
sed 's/.*/file&.txt/' < need.txt | xargs cat > out.txt
Something like this should work for you:
sed -e 's/.*/file&.txt/' need.txt | xargs cat > out.txt
It uses sed to translate each line into the appropriate file name and then hands the filenames to xargs to hand them to cat.
Using awk it could be done this way:
awk 'NR==FNR{ARGV[ARGC]="file"$1".txt"; ARGC++; next} {print}' need.txt > out.txt
Which adds each file to the ARGV array of files to process and then prints every line it sees.
It is possible do it without any sed or awk command. Directly using bash built-in functions and cat (of course).
for i in $(cat need.txt); do cat file${i}.txt >> out.txt; done
And as you want, it is quite simple.

extracting the column using AWK

I am trying to extract column using AWK.
Source file is a .CSV file and below is command I am using:
awk -F ',' '{print $1}' abc.csv > test1
Data in file abc.csv is like below:
xyz#yahoo.com,160,1,2,3
abc#ymail.com,1,2,3,160
But data obtained in test1 is like :
abc#ymail.comxyz#ymail.com
when file is opened in notepad after downloading the file from server.
Notepad doesn't show newlines created on unix. If you want to add them, try
awk -F ',' '{print $1"\r"}' abc.csv > test1
Since you're using a Window tool to read the output you just need to tell awk to use Windows line-endings as the Output Record Separator:
awk -v ORS='\r\n' -F',' '{print $1}' file

Print name of the file in front of every line of file

I have a lot of text files and I want to make a bash script in linux to print the name of file in each lines of file. For example I have file lenovo.txt and I want that every line in the file to start with lenovo.txt.
I try to make a "for" for this but didn't work.
for i in *.txt
do
awk '{print '$i' $0}' /var/SambaShare/$i > /var/SambaShare/new_$i
done
Thanks!
It doesn't work because you need to pass $i to awk with the -v option. But you can also use the FILENAME built-in variable in awk :
ls *txt
file.txt file2.txt
cat *txt
A
B
C
A2
B2
C2
for i in *txt; do
awk '{print FILENAME,$0}' $i;
done
file.txt A
file.txt B
file.txt C
file2.txt A2
file2.txt B2
file2.txt C2
An to redirect into a new file :
for i in *txt; do
awk '{print FILENAME,$0}' $i > ${i%.txt}_new.txt;
done
As for your corrected version :
for i in *.txt
do
awk -v i=$i '{print i,$0}' $i > new_$i
done
Hope this helps.
Using grep you can make use of the --with-filename (alias -H) option and use an empty pattern that always matches:
for i in *.txt
do
grep -H "" $i > new_$i
done
Awk and Bash don't share the same variables as they are different languages with separate interpreters. You should pass Bash variables to Awk with the -v option.
You should also quote your file name variables to ensure they don't get expanded as separate arguments if they contain whitespace.
for i in *.txt
do
awk -v i="$i" '{print i,$0}' "$i" > "$i"
done

Awk split file give incomplete lines

My file is a csv file with comma delimited fields.
I tried to split the file into multiple files by first field. I did the following:
cat myfile.csv | awk -F',' '{print $0 > "Mydata"$1".csv"}'
It does split the file, but the file is corrupted, the last line of each file is not complete. The breaking position seems random. Anyone has the same problem?
These types of problem are invariably because you created your input file on Windows and so it has spurious control-Ms at the end of the lines. Run dos2unix on your input file to clean it up then re-run your awk command but re-write it as:
awk -F',' '{print > ("Mydata" $1 ".csv") }' myfile.csv
to solve a couple of unrelated problems.
Use this awk command to ignore \r characters before \n:
awk -F ',' -v RS='\r\n' '{print > ("Mydata" $1 ".csv") }' myfile.csv
Just don't forget to close your files :
awk -F ',' '{ f="Mydata"$1".csv"; print $0 > f; close(f) }' myfile.csv
Use a real CSV parser/generator instead. It's safe for unusual inputs including those with multi-lined values. And here's a one-liner for Ruby:
ruby -e 'require "csv";CSV.foreach(ARGV.shift){|r| File.open("Mydata#{r[0]}.csv","w").puts(CSV.generate_line(r))}' file.csv

Resources