split and write the files with AWK -Bash - linux

INPUT_FILE.txt in c:\Pro\usr\folder1
ABCDEFGH123456
ABCDEFGH123456
ABCDEFGH123456
BBCDEFGH123456
BBCDEFGH123456
used the below AWK command in .SH script which runs from c:\Pro\usr\folder2 to split the file to multiple txt files with an extension of _kg based on first 8 characters.
awk '{ F=substr($0,1,8) "_kg" ".txt"; print $0 >> F; close(F) }' ' "c:\Pro\usr\folder1\input_file.txt"
this is working good , but the files are writing in the main location where the bash is pointing. How can I route the created files to another location like c:\Pro\usr\folder3.
Thanks

Following awk code may help you in same, written and tested with shown samples in GNU awk.
awk -v outPath='c:\\Pro\\usr\\folder3' -v FPAT='^.{8}' '{outFile=($1"_kg.txt");outFile=outPath"\\"outFile;print > (outFile);close(outFile)}' Input_file
Explanation: Creating an awk variable named outPath which has path mentioned by OP in samples. Then setting FPAT(field separator settings as a regex), where I am creating field of 8 characters starting from first character. In main program of awk, creating outFile variable which has output file names in it(1st field following by _kg.txt), then printing whole line to output file and closing the output file in backend to avoid "too many opened files" error.

Pass the destination folder as a variable to awk:
awk -v dest='c:\\Pro\\usr\\folder3\\' '{F=dest substr($0,1,8) "_kg" ".txt"; print $0 >> F; close(F) }' "c:\Pro\usr\folder1\input_file.txt"
I think the doubled backslashes are required.

Related

Remove the path prefix of the fullpath stored in a file

I have a file containing multiple full path
/home/pi/1.txt
/home/pi/2.txt
/home/pi/3.txt
and I want to get the basename of every file
1.txt
2.txt
3.txt
I only know that I can get the every line and use command
basename
Is it possible to achive my goal more simple? Thank you.
The simplest way I came up with is this:
basename -a $(<foo.txt)
Which works because the process substitution $() is the redirected output of the file, which is then split into multiple arguments because of word-splitting. Basename takes multiple args with -a.
Note that this doesn't work if there are spaces in the pathnames in the file (because of the said wordsplitting).
Another in awk:
$ awk 'sub(/.*\//,"")||1' file
1.txt
2.txt
3.txt
#try:
awk -F"/" '{print $NF}' Input_file
Making "/" as field separator and printing the last field of each line.
A solution that use only shell tools:
readarray -t arr <file.txt
echo "${arr[#]##*/}"
This assumes that each file is one line (even with spaces). Filenames with newlines will fail as some other structure would be needed in the file.

how to pass the filename as variable to a awk command from a shell script

in my shell script i have the following line
PO_list=$(awk -v col="$1" -F";" '!seen[$col]++ {print $col}' test.csv)
which generates a list with the values from column "col" which came from "$1" from file test.csv.
it might be possible to have several files in same location and for this would need to loop among them with a for sentence. For this I have to replace the filename test.csv with a variable, $i for example, which is the index from the list of files.
trying to fulfill my request, I was modifying my line with
PO_list=$(awk -v col="$1" -F";" '!seen[$col]++ {print $col}' $j)
unfortunately, i receive the error message:
awk: cannot open test.csv (No such file or directory)
Can anyone tell me why this error occur and how can I solve it, please?
Thank you,
As you commented in your previous question, you are calling it with
abc$ ./test.sh 2
So you just need to add another parameter when you call it:
abc$ ./test.sh 2 "test.csv"
and the script can be like this:
PO_list=$(awk -v col="$1" -F";" '!seen[$col]++ {print $col}' "$2")
# ^^^^
Whenever you want to use other parameters, remember they are positional. Hence, the first one is $1, second is $2 and so on.
In case the file happens to be in another directory, you can replace ./test.sh 2 "test.csv" by something like ./test.sh 2 "/full/path/of/test.csv" or whatever relative path you may need.

Trying to understand a simple Linux code

I am trying to figure out What does the following command mean in linux
awk 'match($0, "##SA") ==0 {print $0} ' $1 > ${G_DEST_DIR}/${G_DEST_FILENAME}
Does it remove the 1st line from the given parameter and places it under dest_dir ?
This awk prints all lines from input file that don't match the pattern:
##SA
Output of this awk is redirected to file name represented by:
${G_DEST_DIR}/${G_DEST_FILENAME}
Note $1 is shell variable here which is actually input file for awk.
Though same awk be shortened to:
awk '!/##SA/' "$1" > "${G_DEST_DIR}/${G_DEST_FILENAME}"

passing awk variable to bash script

I am writing a bash/awk script to process hundreds of files under one directory. They all have name suffix of "localprefs". The purpose is to extract two values from each file (they are quoted by ""). I also want to use the same file name, but without the name suffix.
Here is what I did so far:
#!/bin/bash
for file in * # Traverse all the files in current directory.
read -r name < <(awk ' $name=substr(FILENAME,1,length(FILENAME)-10a) END {print name}' $file) #get the file name without suffix and pass to bash. PROBLEM TO SOLVE!
echo $name # verify if passing works.
do
awk 'BEGIN { FS = "\""} {print $2}' $file #this one works fine to extract two values I want.
done
exit 0
I could use
awk '{print substr(FILENAME,1,length(FILENAME)-10)}' to extract the file name without suffix, but I am stuck on how to pass that to bash as a variable which I will use as output file name (I read through all the posts on this subject here, but maybe I am dumb none of them works for me).
If anyone can shed a light on this, especially the line starts with "read", you are really appreciated.
Many thanks.
Try this one:
#!/bin/bash
dir="/path/to/directory"
for file in "$dir"/*localprefs; do
name=${file%localprefs} ## Or if it has a .: name=${file%.localprefs}
name=${name##*/} ## To exclude the dir part.
echo "$name"
awk 'BEGIN { FS = "\""} {print $2}' "$file" ## I think you could also use cut: cut -f 2 -d '"' "$file"
done
exit 0
To just take sbase name, you don't even need awk:
for file in * ; do
name="${file%.*}"
etc
done

Unix command to remove everything after first column

I have a text file in which I have something like this-
10.2.57.44 56538154 3028
120.149.20.197 28909678 3166
10.90.158.161 869126135 6025
In that text file, I have around 1,000,000 rows exactly as above. I am working in SunOS environment. I needed a way to remove everything from that text file leaving only IP Address (first column in the above text file is IP Address). So after running some unix command, file should look like something below.
10.2.57.44
120.149.20.197
10.90.158.161
Can anyone please help me out with some Unix command that can remove all the thing leaving only IP Address (first column) and save it back to some file again.
So output should be something like this in some file-
10.2.57.44
120.149.20.197
10.90.158.161
If delimiter is space character use
cut -d " " -f 1 filename
If delimiter is tab character , no need for -d option as tab is default delimiter for cut command
cut -f 1 filename
-d
Delimiter; the character immediately following the -d option is the field delimiter .
-f
Specifies a field list, separated by a delimiter
nawk '{print $1}' file > newFile && mv newFile file
OR
cut -f1 file > newFile && mv newFile file
As you're using SunOS, you'll want to get familiar with nawk (not awk, which is the old, and cranky version of awk, while nawk= new awk ;-).
In either case, you're printing the first field in the file to newFile.
(n)awk is a complete programming language designed for the easy manipulation of text files. The $1 means the first field on each line, $9 would mean the ninth field, etc, while $0 means the whole line. You can tell (n)awk what to use to separate the fields by, it might be a tab char, or a '|' char, or multiple spaces. By default, all versions of awk uses white space, i.e. multiple spaces, or 1 tab to delimit the columns/fields, per line in a file.
For a very good intro to awk, see Grymoire's Awk page
The && means, execute the next command only if the previous command finished without a problem. This way you don't accidentally erase your good data file, becuase of some error.
IHTH
If you have vim , open the file with it. Then in command mode write for substitution (tab or space or whatever is the delimiter) %s:<delimiter>.*$::g. Now save the file with :wq.
Using sed give command like this sed -e 's/<delimiter>.*$//' > file.txt
How about a perl script ;)
#!/usr/bin/perl -w
use strict;
my $file = shift;
die "Missing file or can't read it" unless $file and -r $file;
sub edit_in_place
{
my $file = shift;
my $code = shift;
{
local #ARGV = ($file);
local $^I = '';
while (<>) {
&$code;
}
}
}
edit_in_place $file, sub {
my #columns = split /\s+/;
print "$columns[0]\n";
};
This will edit the file in place since you say it is a large one. You can also create a backup by modifying local $^I = ''; to local $^I = '.bak';
Try this
awk '{$1=$1; print $1}' temp.txt
Output
10.2.57.44
120.149.20.197
10.90.158.161
awk '{ print $1 }' file_name.txt > tmp_file_name.txt
mv tmp_file_name.txt file_name.txt
'> tmp_file_name.txt' means redirecting STDOUT of awk '{ print $1 }' file_name.txt to a file named tmp_file_name.txt
FYI :
$1 means first column based on delimiter. The default delimiter is whitespace
$2 means second column based on delimiter. The default delimiter is whitespace
..
..
$NR means last column based on delimiter. The default delimiter is whitespace
If you want to change delimiter, use awk with -F

Resources