linux shell script to fetch a <latest file> of <particular name> in a folder - linux

I'm enhancing my linux script, I have a folder with lots of files of different dates, I want to fetch a latest file starting with a particular name.
For ex.
I have below list of files in a folder I need latest file of name Subnetwork_RAN in a folder:
Subnetwork_PCC_11Dec2022UTC0500
Subnetwork_RAN_12Dec2022UTC0500
Subnetwork_RAN_13Dec2022UTC0500
Subnetwork_PCC_13Dec2022UTC0500
Output will be file name Subnetwork_RAN_13Dec2022UTC0500
I tried to build a linux shell script to get latest file of particular name.

This problem has a rather simple awk solution:
ls -tl | awk ' $9 ~ /Subnetwork_RAN/ {print $9; exit;}'
ls -tl outputs a long listing of the current directory, sorted by time (newest first).
This output is piped to awk which (line-by-line) looks for a filename containing the required string. The first time it finds one, it prints the filename and exits.
Note, this assumes (as in your example) that the filename contains no white space. If it does, you need to modify the print statement to print the substring of the line $0 beginning with your string, to the end of the line.
If your string might be repeated in more recent filenames but not at the start, the regex condition can be modified to select only filenames where your string is at the start $9~/^Subnetwork_RAN/

Supposing you have a file called test.txt with the filenames you showed. Then in bash you can do this:
awk 'BEGIN {FS="_"} $0 ~/Subnetwork_RAN/ {printf "%s ",$0; system("date +%s -d " $3)}' asd | sort -rn -k 2 | head -1 | cut -d " " -f 1
Output:
Subnetwork_RAN_13Dec2022UTC0500
Some explanation:
$0 ~ /Subnetwork_RAN/ matches all the lines containing the sub-string "Subnetwork_RAN"
The bash command date can recognize the date format like this 13Dec2022UTC0500 and transform it in a timestamp (date +%s)
sort sorts numerically in reverse order based on the second field (timestamp output of awk system call)
head gives the first line, i.e., the most recent
cut takes only the first field given a field separator equal to " ". The first field is the full filename (printf call in awk)

Related

Filtering large data file by date using command line

I have a csv file that contains a bunch of data with one of the columns being date. I am trying to extract all lines that have dates in a specific year and save it into a new file.
The format of file is like this with the date and time in the second column:
000000000,10/04/2021 02:10:15 AM,.....
So far I tried:
grep -E ^2020 data.csv >> temp.csv
But it just produced an empty temp list. Any ideas on how I can do this?
One potential solution is with awk:
awk -F"," '$2 ~ /\/2020 /' data.csv > temp.csv
Another potential option is with grep:
grep "\/2020 " data.csv > temp.csv
However, the grep solution may detect "/2020 " elsewhere in the file, rather than in column 2.
Although awk solution is best here, e.g.
awk -F, 'index($2, "/2021 ")' file
grep can also be used here:
grep '^[^,]*,[^,]*/2021 ' file
See the online demo
Notes:
awk -F, 'index($2, "/2021 ")' splits the lines (records) into fields with a comma (see -F,), and if there is a /2021 + space in the second field ($2) the line is printed
the ^[^,]*,[^,]*/2021 pattern in the grep command matches
^ - start of string
[^,]* - zero or more non-comma chars
,[^,]* - a , and zero or more non-comma chars
/2021 - a literal substring.

How to get first word of every line and pipe it into dmenu script

I have a text file like this:
first state
second state
third state
Getting the first word from every line isn't difficult, but the problem comes when adding the extra \n required to separate every word (selection) in dmenu, per its syntax:
echo -e "first\nsecond\nthird" | dmenu
I haven't been able to figure out how to add the separating \n. I've tried this:
state=$(awk '{for(i=1;i<=NF;i+=2)print $(i)'\n'}' text.txt)
But it doesn't work. I also tried this:
lol=$(grep -o "^\S*" states.txt | perl -ne 'print "$_"')
But same deal. Not sure what I'm doing wrong.
Your problem is in the AWK script. You need to identify each input line as a record. This way, you can control how each record in the output is separated via the ORS variable (output record separator). By default this separator is the newline, which should be good enough for your purpose.
Now to print the first word of every input record (each line in the input stream in this case), you just need to print the first field:
awk '{print $1}' textfile | dmenu
If you need the output to include the explicit \n string (not the control character), then you can just overwrite the ORS variable to fit your needs:
awk 'BEGIN{ORS="\\n"}{print $1}' textfile | dmenu
This could be more easily done in while loop, could you please try following. This is simple, while is reading the file and during that its creating 2 variables 1st is named first and other is rest first contains first field which we are passing to dmenu later inside.
while read first rest
do
dmenu "$first"
done < "Input_file"
Based on the text file example, the following should achieve what you require:
awk '{ printf "%s\\n",$1 }' textfile | dmenu
Print the first space separated field of each line along with \n (\n needs to be escaped to stop it being interpreted by awk)
In your code
state=$(awk '{for(i=1;i<=NF;i+=2)print $(i)'\n'}' text.txt)
you attempted to use ' inside your awk code, however code is what between ' and first following ', therefore code is {for(i=1;i<=NF;i+=2)print $(i) and this does not work. You should use " for strings inside awk code.
If you want to merely get nth column cut will be enough in most cases, let states.txt content be
first state
second state
third state
then you can do:
cut -d ' ' -f 1 states.txt | dmenu
Explanation: treat space as delimiter (-d ' ') and get 1st column (-f 1)
(tested in cut (GNU coreutils) 8.30)

Bash script - delete duplicates

I need to extract name from a file and delete duplicates.
output.txt:
Server001-1
Server001-2
Server001-3
Server001-4
Server002-1
Server002-2
Server003-1
Server003-2
Server003-3
I need to only have output as follow.
Server001-1
Server002-1
Server003-1
So, only print first server for every server group (Server00*) and delete the rest in that group.
try simply with awk:
awk -F"-" '!a[$1]++' Input_file
Explanation: Making a field separator as - and then creating an array named a whose index is current line's 1st field and checking here a condition !a[$1] means it will check if current line's 1st field doesn't have any presence in array a then do a print of that line and then ++ means it will create that specific line's 1st field's occurrence value to 1 in array a so next time that line will not be printed.
awk -F- 'dat[$1]=="" { dat[$1]=$0 } END { for (i in dat) {print dat[i]}}' filename
result:
Server001-1
Server002-1
Server003-1
Create an array keyed on the first space delimited piece of data storing the complete line only when there are no other entries in that array entry. This will ensure that only the first unique entry is stored. Loop through the array and print
Simple GNU datamash solution:
datamash -t'-' -g1 first 2 <file
-t'-' - field separator
-g1 - group lines by the 1st field
first 2 - get only first value of the 2 field for each group. Can be also changed to min 2 operation
The output:
Server001-1
Server002-1
Server003-1
Since you've mentioned the string format as Server00*, you can simply use this one :
grep -E "Server\d+-1" file
Server\d+ for cases Server1000, Server100000 etc
or even
grep '[-]1$' file
Output for both :
Server001-1
Server002-1
Server003-1
A simple way just 1 command line to get general unique result:
nin output.txt nul "^(\w+)-\d+" -u -w
Explanation:
nul is a non-existing Windows file like /dev/null on Linux.
-u to get unique result, -w to output whole lines. Ignore case ? use -i.
"^(\w+)-\d+" is the same Regex syntax in C++/C#/Java/Scala, etc.
Save to file ? nin output.txt nul "^(\w+)-\d+" -u -w > result.txt
Save to file with summary info ? nin output.txt nul "^(\w+)-\d+" -u -w -I > result.txt
Future automation with nin.exe : Result count = return value %ERRORLEVEL%
nin.exe / nin.gcc* is a single portable exe tool to get difference or intersection keys/lines between 2 files or a pipe and a file. See my open project tools directory of https://github.com/qualiu/msr.
And you can also see colorful built-in usage/examples: https://qualiu.github.io/msr/usage-by-running/nin-Windows.html

How to search a specific column in a file for a string in shell?

I am learning shell scripting. I wrote a script that indexes all the files and folders in the current working directory in the following tab separated format,
Path Filename File/Directory Last Modified date Size Textfile(Y/N)
for example, two lines in my file 'index.txt' are,
home/Desktop/C C d Thu Jan 16 01:23:57 PST 2014 4 KB N
home/Desktop/C/100novels.txt 100novels.txt f Thu Mar 14 06:04:06 PST 2013 0 KB Y
Now, i want to write another script to search for files from the previous file that takes in command line parameters, the file name. And also optional parameters as -dir (if to search only for directories), -date DATE (for last modified date), -t (for text files only) etc. How to go about this ?
I know the grep command that searches in a file for a string, but suppose i entered a file name, it would search the entire file for the file name instead of searching only the filename column. How do i do this? Is there any other command to achieve this ?
"i want to search in a speific column for a particular string... how to do that ?"
For example, to search for directories:
awk -F '\t' '$3 == "d"' filename
You can also search by regular expression or substring matching. Please post further questions as you learn.
Single quotes prevent variable substitution (and other forms of expansion) -- see the bash manual. So you have to pass the value by other means:
awk -F'\t' -v s="$search_file" '$2 == s' ./index.txt > final.txt
or, use double quotes, more fragile and harder to read IMO
awk -F'\t' "\$2 == \"$search_file\"" ./index.txt > final.txt
Also, you don't mix single and double quotes. Pick one or the other, depending on the functionality you need.
You can doit using the command awk. For example to print the first column of a file using awk:
awk -F ":" '{print $1}' /etc/passwd
(note that the -F flag is to choose a field separator for the columns, in your case you don't need it). Then you can use grep to filter in that column. Good luck!

Unix command to remove everything after first column

I have a text file in which I have something like this-
10.2.57.44 56538154 3028
120.149.20.197 28909678 3166
10.90.158.161 869126135 6025
In that text file, I have around 1,000,000 rows exactly as above. I am working in SunOS environment. I needed a way to remove everything from that text file leaving only IP Address (first column in the above text file is IP Address). So after running some unix command, file should look like something below.
10.2.57.44
120.149.20.197
10.90.158.161
Can anyone please help me out with some Unix command that can remove all the thing leaving only IP Address (first column) and save it back to some file again.
So output should be something like this in some file-
10.2.57.44
120.149.20.197
10.90.158.161
If delimiter is space character use
cut -d " " -f 1 filename
If delimiter is tab character , no need for -d option as tab is default delimiter for cut command
cut -f 1 filename
-d
Delimiter; the character immediately following the -d option is the field delimiter .
-f
Specifies a field list, separated by a delimiter
nawk '{print $1}' file > newFile && mv newFile file
OR
cut -f1 file > newFile && mv newFile file
As you're using SunOS, you'll want to get familiar with nawk (not awk, which is the old, and cranky version of awk, while nawk= new awk ;-).
In either case, you're printing the first field in the file to newFile.
(n)awk is a complete programming language designed for the easy manipulation of text files. The $1 means the first field on each line, $9 would mean the ninth field, etc, while $0 means the whole line. You can tell (n)awk what to use to separate the fields by, it might be a tab char, or a '|' char, or multiple spaces. By default, all versions of awk uses white space, i.e. multiple spaces, or 1 tab to delimit the columns/fields, per line in a file.
For a very good intro to awk, see Grymoire's Awk page
The && means, execute the next command only if the previous command finished without a problem. This way you don't accidentally erase your good data file, becuase of some error.
IHTH
If you have vim , open the file with it. Then in command mode write for substitution (tab or space or whatever is the delimiter) %s:<delimiter>.*$::g. Now save the file with :wq.
Using sed give command like this sed -e 's/<delimiter>.*$//' > file.txt
How about a perl script ;)
#!/usr/bin/perl -w
use strict;
my $file = shift;
die "Missing file or can't read it" unless $file and -r $file;
sub edit_in_place
{
my $file = shift;
my $code = shift;
{
local #ARGV = ($file);
local $^I = '';
while (<>) {
&$code;
}
}
}
edit_in_place $file, sub {
my #columns = split /\s+/;
print "$columns[0]\n";
};
This will edit the file in place since you say it is a large one. You can also create a backup by modifying local $^I = ''; to local $^I = '.bak';
Try this
awk '{$1=$1; print $1}' temp.txt
Output
10.2.57.44
120.149.20.197
10.90.158.161
awk '{ print $1 }' file_name.txt > tmp_file_name.txt
mv tmp_file_name.txt file_name.txt
'> tmp_file_name.txt' means redirecting STDOUT of awk '{ print $1 }' file_name.txt to a file named tmp_file_name.txt
FYI :
$1 means first column based on delimiter. The default delimiter is whitespace
$2 means second column based on delimiter. The default delimiter is whitespace
..
..
$NR means last column based on delimiter. The default delimiter is whitespace
If you want to change delimiter, use awk with -F

Resources