I have a students.txt (RollNo, Name, IDU, CGPA), If Roll number exists prompt the user to change the IDU and CGPA and update the same in the file named “Student.txt”
I made the following script:
#! /bin/bash
dispaly(){
awk -F ":" -v roll=$1 '{ if ( $1 == roll) {name = $2; print name; } }
END {if (name == "") print "not found" }' students.txt
}
echo "Enter the roll no."
read rno
if [ $rno -ge 1000 ] && [ $rno -le 9999 ]
then
dispaly $rno
# Now I have a valid $rno and want to update that line
else
echo Enter no between 1000 and 9999
fi
now I need help in taking user input for IDU and CGPA values and update the students.text file with that values against the record found.
In general "-" is used for standard input for awk e.g.
awk '{print($1)}' -
It's not clear to me exactly what you want here. Can't you use additional 'read' statements in the bash part of the script for input of the other 2 values?
first, I grep for roll
grep ^roll students.txt
if found then used awk to replace the records
awk -F : -v rno=$rno -v idu=$idu -v cgpa=$cgpa ' $1==rno { $3=idu;$4=cgpa} OFS=":" ' students.txt > tmp.txt && mv tmp.txt students.txt
Related
I am writing a function in a BASH shell script, that should return lines from csv-files with headers, having more commas than the header. This can happen, as there are values inside these files, that could contain commas. For quality control, I must identify these lines to later clean them up. What I have currently:
#!/bin/bash
get_bad_lines () {
local correct_no_of_commas=$(head -n 1 $1/$1_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $1 | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$1/$1_0_${i}_0.csv" ]; then
echo "File: $1_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "$1_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$1/$1_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk '$1 > $correct_no_of_commas {print}'
done
}
get_bad_lines products
get_bad_lines users
The output of this program is now all the comma-counts with all of the line numbers in all the files,
and I suspect this is due to the input $1 (foldername, i.e. products & users) conflicting with the call to awk with reference to $1 as well (where I wish to grab the first column being the count of commas for that line in the current file in the loop).
Is this the issue? and if so, would it be solvable by either referencing the 1.st column or the folder name by different variable names instead of both of them using $1 ?
Example, current output:
5 6667
5 6668
5 6669
5 6670
(should only show lines for that file having more than 5 commas).
Tried variable declaration in call to awk as well, with same effect
(as in the accepted answer to Awk field variable clash with function argument)
:
get_bad_lines () {
local table_name=$1
local correct_no_of_commas=$(head -n 1 $table_name/${table_name}_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $table_name | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$table_name/${table_name}_0_${i}_0.csv" ]; then
echo "File: ${table_name}_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "${table_name}_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$table_name/${table_name}_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk -v table_name="$table_name" '$1 > $correct_no_of_commas {print}'
done
}
You can use awk the full way to achieve that :
get_bad_lines () {
find "$1" -maxdepth 1 -name "$1_0_*_0.csv" | while read -r my_file ; do
awk -v table_name="$1" '
NR==1 { num_comma=gsub(/,/, ""); }
/,/ { if (gsub(/,/, ",", $0) > num_comma) wrong_array[wrong++]=NR":"$0;}
END { if (wrong > 0) {
print(FILENAME" has over "num_comma" commas in the following lines:");
for (i=0;i<wrong;i++) { print(wrong_array[i]); }
}
}' "${my_file}"
done
}
For why your original awk command failed to give only lines with too many commas, that is because you are using a shell variable correct_no_of_commas inside a single quoted awk statement ('$1 > $correct_no_of_commas {print}'). Thus there no substitution by the shell, and awk read "$correct_no_of_commas" as is, and perceives it as an undefined variable. More precisely, awk look for the variable correct_no_of_commas which is undefined in the awk script so it is an empty string . awk will then execute $1 > $"" as matching condition, and as $"" is a $0 equivalent, awk will compare the count in $1 with the full input line. From a numerical point of view, the full input line has the form <tab><count><tab><num_line>, so it is 0 for awk. Thus, $1 > $correct_no_of_commas will be always true.
You can identify all the bad lines with a single awk command
awk -F, 'FNR==1{print FILENAME; headerCount=NF;} NF>headerCount{print} ENDFILE{print "#######\n"}' /path/here/*.csv
If you want the line number also to be printed, use this
awk -F, 'FNR==1{print FILENAME"\nLine#\tLine"; headerCount=NF;} NF>headerCount{print FNR"\t"$0} ENDFILE{print "#######\n"}' /path/here/*.csv
I am attempting to create a program in Unix that accesses a data file, adding, deleting, and searching within the file for names and usernames. With this if statement, I am attempting to allow the user to search for data in the file by the first field.
All of the data in the file uses uppercase letters, so I first must convert any text the user input from lowercase to uppercase letters. For some reason, this code is not working with both converting to uppercase and searching and printing the data.
How can I fix it?
if [ "$choice" = "s" ] || [ "$choice" = "S" ]; then
tput cup 3 12
echo "Enter the first name of the user you would like to search for: "
tput cup 4 12; read search | tr '[a-z]' '[A-Z]'
echo "$search"
awk -F ":" '$1 == "$search" {print $3 " " $1 " " $2 }'
capstonedata.txt
fi
This: read search | tr '[a-z]' '[A-Z]' will not assign anything to variable search.
It should be something like
read input
search=$( echo "$input" | tr '[a-z]' '[A-Z]' )
and it is better to use parameter expansion for case modification:
read input
search=${input^^}
If you use Bash, you can declare a variable to convert to uppercase:
$ declare -u search
$ read search <<< 'lowercase'
$ echo "$search"
LOWERCASE
As for your code, read doesn't have any output, so piping to tr doesn't do anything, and you can't have a newline before the file name in the awk statement.
Edited version of your code, minus all the tput stuff:
# [[ ]] to enable pattern matching, no need to quote here
if [[ $choice = [Ss] ]]; then
# Declare uppercase variable
declare -u search
# Read with prompt
read -p "Enter the first name of the user you would like to search for: " search
echo "$search"
# Proper way of getting variable into awk
awk -F ":" -v s="$search" '$1 == s {print $3 " " $1 " " $2 }' capstonedata.txt
fi
Alternatively, if you want to use only POSIX shell constructs:
case $choice in
[Ss] )
printf 'Enter the first name of the user you would like to search for: '
read input
search=$(echo "$input" | tr '[[:lower:]]' '[[:upper:]]')
awk -F ":" -v s="$search" '$1 == s {print $3 " " $1 " " $2 }' capstonedata.txt
;;
esac
Awk is not shell (google that). Just do:
if [ "$choice" = "s" ] || [ "$choice" = "S" ]; then
read search
echo "$search"
awk -F':' -v srch="$search" '$1 == toupper(srch) {print $3, $1, $2}' capstonedata.txt
fi
If i want to identify a pattern in Unix in one single directory, may i know which unix utility will be helpful ( like awk )
Input :
$ ls
a_20171007_001.txt
a_20171007_002.txt
b_20171007_001.txt
c_20180101_001.txt
expecting output :
a_20171007_002.txt
b_20171007_001.txt
The output should return latest version of file based on filename irrespective of file creation time
The output file shouldn't have future dated file ( e.g., current date :20171008 so 20180101 shouldn't come in output )
any suggestions on how to achieve this easily in unix ( awk or sed )
Thanks alot for all your solutions. But unfortunately if the file name is not follow any pattern it is not helping.
eg, input :
ab_bc_all_20171008_001.txt
bc_cd_ad_all_20171008_001.txt
ab_bc_all_20171008_002.txt
ad_dc_cd_ed_all_20180101_001.txt
ae_bc_zx_ed_ac_all_20170918_001.txt
output :
bc_cd_ad_all_20171008_001.txt
ab_bc_all_20171008_002.txt
ae_bc_zx_ed_ac_all_20170918_001.txt
in above case only pattern after 'all' the date field is appearing.
Can you please suggest in above case..
Thanks in advance.
Something like this in Perl:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use Time::Piece;
my $today = localtime->ymd("");
my %latest;
for my $file (glob '*.txt') {
my ($id, $date, $num) = split /[_.]/, $file;
$latest{$id}{$date} = $num
if $date <= $today
&& (! exists $latest{$id}
|| ! exists $latest{$id}{$date}
|| $num > $latest{$id}{$date});
}
for my $id (keys %latest) {
for my $date (keys %{ $latest{$id} }) {
say "$id\_$date\_$latest{$id}{$date}.txt";
}
}
a simple awk solution
$ awk -F_ -vdate=`date +%Y%m%d` ' !($1 in file) && $2<=date {file[$1]=$0} ($1 in file){if($0>=file[$1]){file[$1]=$0}} END{ for(i in file)print file[i] }' f1
a_20171007_002.txt
b_20171007_001.txt
Explanation:
Store the current date in date variable in the format yyyymmdd
While iterating though records/filenames, if the date in filename i.e $2 is less than or equal to current date and the prefix (for eg. a,b etc) doesn't exist in array file then store it in file array for eg. file['a']=a_20171007_001.txt else it won't be stored and in this example c_20180101_001.txt would be straight forwardly rejected.
For next records, if the prefix i.e $1 exists in array file then check if the whole record is greater than the existing record (lexicographically). If yes, overwrite the record in file array.
Could you please try following and let me know if this helps you.
ls -ltr *.txt | awk -v date=$(date +%Y) -F"_" 'prev != $1 && val && date_val<=date{print val} {prev=$1;val=$0;date_val=substr($2,1,4)} END{if(date_val<=date){print val}}'
Adding a better readable form of solution too now.
ls -ltr *.txt | awk -v date=$(date +%Y) -F"_" '
prev != $1 && val && date_val<=date{
print val
}
{
prev=$1;
val=$0
date_val=substr($2,1,4)
}
END{
if(date_val<=date){
print val
}
}'
GNU Awk solution for static filename format <prefix>_<date>_<version>.txt:
Exemplary ls -1 output (extended):
a_20171007_001.txt
a_20171007_002.txt
b_20171007_001.txt
c_20180101_001.txt
a_20171007_0010.txt
b_20171007_004.txt
ls -1 | awk -F'[_.]' '{ k=$1"_"$2 }{ if (a[k]<$3) a[k]=$3 }
END{
for (i in a) {
split(substr(i, index(i,"_")+1), b, "");
ts=mktime(sprintf("%d %d %d 00 00 00",b[1]b[2]b[3]b[4],b[5]b[6],b[7]b[8]));
if (systime() >= ts) print i"_"a[i]".txt"
}
}'
The output:
b_20171007_004.txt
a_20171007_0010.txt
This one is ok only in shell (dash)
d=$(date +%Y%m%d)
ls -1r *_*_*.txt|while IFS='_' read w x y
do
[ "$x" -le "$d" ] && [ "$v" != "$w$x" ] && { echo "$w"_"$x"_"$y";v="$w$x";}
done
The spec change ???
Try this one
d=$(date +%Y%m%d)
ls -1r *_*_*.txt|while read l
do
b="${l%_*_*}"
a="${l#$b*_}"
c="${a%_*}"
[ "$c" -le "$d" ] && [ "$v" != "$b$c" ] && { echo "$l";v="$b$c";}
done
$ ls -1r | awk -v today="$(date +%Y%m%d)" -F'_' '($2 <= today) && !seen[$1,$2]++'
b_20171007_001.txt
a_20171007_002.txt
I want to be able to check a file to see if all records are 0 within a file and if they are to then move the file.
I have written the script, ran it, no errors, but it does not move the file, can anyone please suggest why?
#!/bin/bash
result=`cat conc_upld_atp.11002.20141204151900.dat | awk -F , '{ print $6 }' | uniq`
if [ result = "1" ]; then
mv conc_upld_atp.11002.20141204151900.dat home/stephenb/scripttest
fi
In Bash = compares strings; to you compare integers you need -eq:
if [ "$result" -eq 1 ]; then
Note that it is preferred to say var=$(command). Also, your command cat file | awk '...' can be simplified to just awk '...' file. And depending on what exactly you want to do, probably awk can handle all of it.
For example, if you just want to check if any of the 6th fields are not 0, just use:
awk '$1 != 0 {v=1} END {print v+0}' file
and then the rest of your code.
However, you can do it in a extremely fast way by using what 999999999999999999999999999999 suggested in comments:
awk -F, '$6!=0{exit 1}' file && mv file newfile
This loops through the file and exits with a code error if any line contains a 6th field different from 0. If this does not happen, awk's exit code is 0, so that the && command is performed and, hence, mv file newfile happens. You can even keep track of the other condition by saying:
awk -F, '$6!=0{exit 1}' file && mv file newfile || echo "bad data"
You're wanting to check if all records are 0, but you specifically check result = "1".
I would recommend two things: use numerical comparison and compare against the correct value:
if (( result == 0 )); then
If you want check whether all fields is 0, then try this script:
result=`cat conc_upld_atp.11002.20141204151900.dat | awk '{ print $6 }' | uniq`
if [ "$result" == "0" ]; then
mv conc_upld_atp.11002.20141204151900.dat /home/stephenb/scripttest
fi
In the end i used the below.
#!/bin/bash
result=`cat conc_upld_atp.11002.20141204151900.dat | awk -F , '{ print $6 }' | uniq`
resultcount=`echo $result | wc -l`
echo $resultcount
if [ $resultcount == "1" ]; then
echo match
mv conc_upld_atp.11002.20141204151900.dat /home/stephenb/scripttest
fi
I have two sample files as follows:
File1
item name="NY" block="A" idnum="12345"
item name="NJ" block="B" idnum="123456"
item name="CT" block="C" idnum="1234567"
File2
ID_B|ID_C|NY|4|8198|10|2374|127
ID_C|ID_D|NJ|4|8198|10|2374|127
ID_D|ID_E|CT|4|8198|10|2374|127
I would like to be able to generate a file as passing ID as argument and output should look like this
If I am looking for info for ID_B then output should be
ID_B|ID_C|NY|4|8198|10|2374|127 => "NY" block="A" idnum="12345"
If am looking for two ID_C and ID_D together it should be
ID_C|ID_D|NJ|4|8198|10|2374|127 => "NJ" block="B" idnum="123456"
ID_D|ID_E|CT|4|8198|10|2374|127 => "CT" block="C" idnum="1234567"
With bash, join, sort and awk.
script.sh:
#!/bin/bash
file1="File1"
file2="File2"
grep -f <(printf "^%s|\n" "$#") <(join -t '|' -1 1 -2 3 <(awk -F'"' '{OFS="|"; print $2,$4,$6}' "$file1" | sort -t '|' -k 1,1) <(sort -t '|' -k 3,3 "$file2") -o 2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.3,1.2,1.3 | awk -F'|' '{ printf "%s|%s|%s|%s|%s|%s|%s|%s => \"%s\" block=\"%s\" idnum=\"%s\"\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11 }')
Example: script.sh ID_C ID_D
Output:
ID_D|ID_E|CT|4|8198|10|2374|127 => "CT" block="C" idnum="1234567"
ID_C|ID_D|NJ|4|8198|10|2374|127 => "NJ" block="B" idnum="123456"
Something like
echo "Please enter the key"
read key
grep "item name=\"$(grep "^${key}|" file2 | cut -d"|" -f3)\"" file1
Edit:
I did not make the literal output, just wanted to show that you could grep the fields. A complete solution should look like:
#!/bin/bash
if [ $# -eq 0 ]; then
echo "Usage $0 id [id2 ..]"
exit 1
fi
for key in $*; do
f2=$(grep "^${key}|" file2 )
f1=$(grep "item name=\"$(echo "^${f2}" | cut -d"|" -f3)\"" file1 )
echo "${f2} => ${f1#item name=}"
done
awk, wrapped in a thin shell script
#!/bin/sh
awk -F '[|"]' -v OFS='"' -v keys="$*" '
BEGIN {
n = split(keys, a, " ")
for (i=1; i <= n; i++) key[a[i]] = 1
}
NR == FNR {
if ($1 in key) line[$3] = $0
next
}
$2 in line {
$1 = line[$2] " => "
print
}
' file2 file1
Then
$ sh script.sh ID_B ID_D
ID_B|ID_C|NY|4|8198|10|2374|127 => "NY" block="A" idnum="12345"
ID_D|ID_E|CT|4|8198|10|2374|127 => "CT" block="C" idnum="1234567"
What you want, is very similar to the JOIN in the SQL.
Unfortunately, implementing this in procedural languages like bash, awk or sed, requires dictionaries, or string-indexed string arrays. (And even with them would be it much more complex as a single-line sql query.) None of them supports it, although there are numerous possibilities to emulate it.
In your place I would use perl or python to solve this problem - on these languages would be your goal so simple, that even with the additional learning effort would it be around so effective as a bash workaround.
But you want a bash-awk-sed solution, so this is what I give you now.
Don't use these languages alone, use them integrated. The most simplest thing what you can do in bash, to use the external command grep.
For example, an integrated awk/bash core which partially does your the task (I didn't tested it):
awk 'BEGIN {FS="[_|]"} { print $2 " " $0}' file1|while read l
do
set $l
echo $2 $(grep 'block="'$1'"' file2)
done
But to understand this code, debug it and extend it, you need to have a big experience in regexps, exact details of escaping, and how is it the fastest to integrate them on both of these languages. On my opinion, if you are learning programming, it is a must-have before you start to learn perl/python, but in everyday work the higher level solutions are better.
You can also write a short bash script that will retrieve the information in a brute-force manner:
#!/bin/bash
[ -z "$1" -o -z "$2" -o -z "$3" ] && {
printf "error: insufficient input. Usage: %s file1 file2 query\n" ${0##*/}
exit 1
}
file1="$1"
file2="$2"
query="$3"
## line from file1
f1=$(grep "$query" "$file1")
[ -z "$f1" ] && {
printf "error: '%s' not found in '%s'\n." "$query" "$file1"
exit 1
}
## line from file2
f2=$(grep $(awk -F '|' '{ print $3 }' <<<$f1) "$file2")
[ -z "$f1" ] && {
printf "error: no related row found in '%s'\n." "$file2"
exit 1
}
printf "%s => %s\n" "$f1" "$f2"
Examples
$ bash joinfiles.sh join1.txt join2.txt ID_B
ID_B|ID_C|NY|4|8198|10|2374|127 => item name="NY" block="A" idnum="12345"
$ bash joinfiles.sh join1.txt join2.txt ID_E
ID_D|ID_E|CT|4|8198|10|2374|127 => item name="CT" block="C" idnum="1234567"
$ bash joinfiles.sh join1.txt join2.txt ID_F
error: 'ID_F' not found in 'join1.txt'
Processing Multiple Input Tags
With small adjustments, you can easily process as many tags as you desire. Below, the code will match all of the tags provided as arguments 3 and beyond. If you want to provide more than a half-dozed tags at a time, then you should adjust the script to read tags from stdin instead of taking them as arguments. Also note, the code matches the tag in the first field of the first file. If you want to match tags in all fields in file one, then f1 will become an array and the remaining part of the code with operate on each element in f1. Let me know what you are thinking. Here is the example for multiple input tags as arguments:
#!/bin/bash
[ -z "$1" -o -z "$2" -o -z "$3" ] && {
printf "error: insufficient input. Usage: %s file1 file2 query\n" ${0##*/}
exit 1
}
[ -f "$1" -a -f "$2" ] || {
printf "error: file not found. '%s' or '%s'\n" "$1" "$2"
exit 1
}
file1="$1"
file2="$2"
for i in "${#:3}"; do
query="$i"
## line from file1
f1=$(grep "^$query" "$file1")
[ -z "$f1" ] && {
printf "error: '%s' not found in '%s'\n." "$query" "$file1"
exit 1
}
## line from file2
f2=$(grep $(awk -F '|' '{ print $3 }' <<<$f1) "$file2")
[ -z "$f1" ] && {
printf "error: no related row found in '%s'\n." "$file2"
exit 1
}
printf "%s => %s\n" "$f1" "$f2"
done
Use/Output
$ bash joinfiles2.sh join1.txt join2.txt ID_B ID_D ID_F
ID_B|ID_C|NY|4|8198|10|2374|127 => item name="NY" block="A" idnum="12345"
ID_D|ID_E|CT|4|8198|10|2374|127 => item name="CT" block="C" idnum="1234567"
error: 'ID_F' not found in 'join1.txt'