How to read in variables from multi-line file? - linux

I am trying to read from a file that is multiple lines with multiple variables per line. There are four variables: item, price, max_qty, and min_qty. The file looks like this:
item price
min_qty max_qty
I have tried:
while IFS=' ' read -r first second third fourth; do
echo "$first $second $third $fourth
done
This does not work. I want the output to be:
item price min_qty max_qty
I have also thought about somehow replacing the new lines with spaces and then reading from that line. I don't want to actually change the file though.

read twice:
while read -r first second && read -r third fourth; do
echo "$first $second $third $fourth"
done < file

If it is true that "the file looks like this" (two lines, two values per line, not a larger file with many pairs of lines like that), then you can do it with a single read by using the -d option to set the line delimiter to empty:
read -r -d '' first second third fourth <file
echo "$first $second $third $fourth"
The code prints 'item price min_qty max_qty' with the example file.
If you have errexit or the ERR trap set, the program will exit immediately, and silently, immediately after the read is executed. See Bash ignoring error for a particular command for ways of avoiding that.
The code will work with the default value of IFS (space+tab+newline). If IFS could be set to something different elsewhere in the program, set it explicitly with IFS=$' \t\n' read -r -d '' ....
The code will continue to work if you change the file format (all values on one line, one value per line, ...), assuming that it still contains only four values.
The code won't work if the file actually contains many line pairs. In that case the "read twice" solution by oguz ismail is good.

Related

IFS and command substitution

I am writing a shell script to read input csv files and run a java program accordingly.
#!/usr/bin/ksh
CSV_FILE=${1}
myScript="/usr/bin/java -version"
while read row
do
$myScript
IFS=$"|"
for column in $row
do
$myScript
done
done < $CSV_FILE
csv file:
a|b|c
Interestingly, $myScript outside the for loop works but the $myScript inside the for loop says "/usr/bin/java -version: not found [No such file or directory]". I have come to know that it is because I am setting IFS. If I comment IFS, and change the csv file to
a b c
It works ! I imagine the shell using the default IFS to separate the command /usr/bin/java and then apply the -version argument later. Since I changed the IFS, it is taking the entire string as a single command - or that is what I think is happening.
But this is my requirement: I have a csv file with a custom delimiter, and the command has arguments in it, separated by space. How can I do this correctly?
IFS indicates how to split the values of variables in unquoted substitutions. It applies to both $row and $myscript.
If you want to use IFS to do the splitting, which is convenient in plain sh, then you need to change the value of IFS or arrange to need the same value. In this particular case, you can easily arrange to need the same value, by defining myScript as myScript="/usr/bin/java|-version". Alternatively, you can change the value of IFS just in time. In both cases, note that an unquoted substitution doesn't just split the value using IFS, it also interprets each part as a wildcard pattern and replaces it by the list of matching file names if there are any. This means that if your CSV file contains a line like
foo|*|bar
then the row won't be foo, *, bar but foo, each file name in the current directory, bar. To process the data like this, you need to turn off with set -f. Also remember that read reads continuation lines when a line ends with a backslash, and strips leading and trailing IFS characters. Use IFS= read -r to turn off these two behaviors.
myScript="/usr/bin/java -version"
set -f
while IFS= read -r row
do
$myScript
IFS='|'
for column in $row
do
IFS=' '
$myScript
done
done
However there are better ways that avoid IFS-splitting altogether. Don't store a command in a space-separated string: it fails in complex cases, like commands that need an argument that contains a space. There are three robust ways to store a command:
Store the command in a function. This is the most natural approach. Running a command is code; you define code in a function. You can refer to the function's arguments collectively as "$#".
myScript () {
/usr/bin/java -version "$#"
}
…
myScript extra_argument_1 extra_argument_2
Store an executable command name and its arguments in an array.
myScript=(/usr/bin/java -version)
…
"${myScript[#]}" extra_argument_1 extra_argument_2
Store a shell command, i.e. something that is meant to be parsed by the shell. To evaluate the shell code in a string, use eval. Be sure to quote the argument, like any other variable expansion, to avoid premature wildcard expansion. This approach is more complex since it requires careful quoting. It's only really useful when you have to store the command in a string, for example because it comes in as a parameter to your script. Note that you can't sensibly pass extra arguments this way.
myScript='/usr/bin/java -version'
…
eval "$myScript"
Also, since you're using ksh and not plain sh, you don't need to use IFS to split the input line. Use read -A instead to directly split into an array.
#!/usr/bin/ksh
CSV_FILE=${1}
myScript=(/usr/bin/java -version)
while IFS='|' read -r -A columns
do
"${myScript[#]}"
for column in "${columns[#]}"
do
"${myScript[#]}"
done
done <"$CSV_FILE"
The simplest soultion is to avoid changing IFS and do the splitting with read -d <delimiter> like this:
#!/usr/bin/ksh
CSV_FILE=${1}
myScript="/usr/bin/java -version"
while read -A -d '|' columns
do
$myScript
for column in "${columns[#]}"
do
echo next is "$column"
$myScript
done
done < $CSV_FILE
IFS tells the shell which characters separate "words", that is, the different components of a command. So when you remove the space character from IFS and run foo bar, the script sees a single argument "foo bar" rather than "foo" and "bar".
the IFS should be placed behind of "while"
#!/usr/bin/ksh
CSV_FILE=${1}
myScript="/usr/bin/java -version"
while IFS="|" read row
do
$myScript
for column in $row
do
$myScript
done
done < $CSV_FILE

linux script to find specific words in file names

I need help writing a script to do the following stated below in part a.
The following code will output all of the words found in $filename, each word on a separate line.
for word in “cat $filename”
do
echo $word
done
a. Write a new script which receives two parameters. The first is a file’s name ($1 instead of $filename) and the second is a word you want to search for ($2). Inside the for loop, instead of echo $word, use an if statement to compare $2 to $word. If they are equal, add one to a variable called COUNT. Before the for loop, initialize COUNT to 0 and after the for loop, output a message that tells the user how many times $2 appeared in $1. That is, output $COUNT, $2 and $1 in an echo statement but make sure you have some literal words in here so that the output actually makes sense to the user. HINTS: to compare two strings, use the notation [ $string1 == $string2 ]. To add one to a variable, use the notation X=$((X+1)). If every instruction is on a separate line, you do not need any semicolons. Test your script on /etc/fstab with the word defaults (7 occurrences should be found)
This is what I got so far, but it does not work right. It says it finds 0 occurrences of the word "defaults" in /etc/fstab. I am sure my code is wrong but can't figure out the problem. Help is appreciated.
count=0
echo “what word do you want to search for?: “
read two
for word in “cat $1”
do
if [ “$two” == “$word” ]; then
count=$((count+1))
fi
done
echo $two appeared $count times in $1
You need to use command substitution, you were looping over this string: cat first_parameter.
for word in $(cat "$1")
Better way to do this using grep, paraphrasing How do I count the number of occurrences of a word in a text file with the command line?
grep -o "\<$two\>" "$1" | wc -l

If condition to check if two strings stored in a variable occur one after other in a file

$cat list
Hi
welcome
one
two
good evening
Value1="two"
value2="evening"
For above file values, output should be echo "values are present one after the other line"
Need to know the if condition command to check if both variable values occur one after the other in a file.
If both variable values occur one line after other in a file, then echo some statement.
for example:
$cat list
Hi
two
one
three
good evening
in above condition, both variable value are not present one after the other line so output should be echo "values are not present one after the other line"
With awk you could write something like this:
awk -F= '$1=="Value1"{l=NR}$1=="value2"&&NR==l+1{print "ok"}' file
#!/bin/bash
Value1="two"
value2="evening"
while read line; do
if [[ "$Value1" == *"$line"* ]];then
read anotherline
if [[ "$anotherline" == *"$value2"* ]];then
echo "values are present one after the other line"
fi
fi
done < list
If you want exact string match then remove * wildcards and use single brackets

Bash: Read in file, edit line, output to new file

I am new to linux and new to scripting. I am working in a linux environment using bash. I need to do the following things:
1. read a txt file line by line
2. delete the first line
3. remove the middle part of each line after the first
4. copy the changes to a new txt file
Each line after the first has three sections, the first always ends in .pdf and the third always begins with R0 but the middle section has no consistency.
Example of 2 lines in the file:
R01234567_High Transcript_01234567.pdf High School Transcript R01234567
R01891023_Application_01891023127.pdf Application R01891023
Here is what I have so far. I'm just reading the file, printing it to screen and copying it to another file.
#! /bin/bash
cd /usr/local/bin;
#echo "list of files:";
#ls;
for index in *.txt;
do echo "file: ${index}";
echo "reading..."
exec<${index}
value=0
while read line
do
#value='expr ${value} +1';
echo ${line};
done
echo "read done for ${index}";
cp ${index} /usr/local/bin/test2;
echo "file ${index} moved to test2";
done
So my question is, how can I delete the middle bit of each line, after .pdf but before the R0...?
Using sed:
sed 's/^\(.*\.pdf\).*\(R0.*\)$/\1 \2/g' file.txt
This will remove everything between .pdf and R0 and replace it with single space.
Result for your example:
R01234567_High Transcript_01234567.pdf R01234567
R01891023_Application_01891023127.pdf R01891023
The Hard, Unreliable Way
It's a bit verbose, and much less terse and efficient than what would make sense if we knew that the fields were separated by tab literals, but the following loop does this processing in pure native bash with no external tools:
shopt -s extglob
while IFS= read -r line; do
[[ $line = *".pdf"*R0* ]] || continue # ignore lines that don't fit our format
filename=${line%%.pdf*}.pdf
id=R0${line##*R0}
printf '%s\t%s\n' "$filename" "$id"
done
${line%%.pdf*} returns everything before the first .pdf in the line; ${line%%.pdf*}.pdf then appends .pdf to that content.
Similarly, ${line##*R0} expands to everything after the last R0; R0${line##*R0} thus expands to the final field starting with R0 (presuming that that's the only instance of that string in the file).
The Easy Way (Using Tab Delimiters)
If cat -t file (on MacOS) or cat -A file (on Linux) shows ^I sequences between the fields (but not within the fields), use the following instead:
while IFS=$'\t' read -r filename title id; do
printf '%s\t%s\n' "$filename" "$id"
done
This reads the three tab separated fields into variables named filename, title and id, and emits the filename and id fields.
Updated answer assuming tab delim
Since there is a tab delimiter, then this is a cinch for awk. Borrowing from my originally deleted answer and #geek1011 deleted answer:
awk -F"\t" '{print $1, $NF}' infile.txt
Here awk splits each record in your file by tab, then prints the first field $1 and the last field $NF where NF is the built in awk variable for the record's Number of Fields; by prepending a dollar sign, it says "The value of the last field in the record".
Original answer assuming space delimiter
Leaving this here in case someone has space delimited nonsense like I originally assumed.
You can use awk instead of using bash to read through the file:
awk 'NR>1{for(i=1; $i!~/pdf/; ++i) firstRec=firstRec" "$i} NR>1{print firstRec,$i,$NF}' yourfile.txt
awk reads files line by line and processes each record it comes across. Fields are delimited automatically by white space. The first field is $1, the second is $2 and so on. awk has built in variables; here we use NF which is the Number of Fields contained in the record, and NR which is the record number currently being processed.
This script does the following:
If the record number is greater than 1 (not the header) then
Loop through each field (separated by white space here) until we find a field that has "pdf" in it ($i!~/pdf/). Store everything we find up until that field in a variable called firstRec separated by a space (firstRec=firstRec" "$i).
print out the firstRec, then print out whatever field we stopped iterating on (the one that contains "pdf") which is $i, and finally print out the last field in the record, which is $NF (print firstRec,$i,$NF)
You can direct this to another file:
awk 'NR>1{for(i=1; $i!~/pdf/; ++i) firstRec=firstRec" "$i} NR>1{print firstRec,$i,$NF}' yourfile.txt > outfile.txt
sed may be a cleaner way of going here since, if your pdf file has more than one space separating characters, then you will lose the multiple spaces.
You can use sed on each line like that:
line="R01234567_High Transcript_01234567.pdf High School Transcript R01234567"
echo "$line" | sed 's/\.pdf.*R0/\.pdf R0/'
# output
R01234567_High Transcript_01234567.pdf R01234567
This replace anything between .pdf and R0 with a spacebar.
It doesn't deal with some edge cases but it simple and clear

Linux command to grab lines similar between files

I have one file that has one word per line.
I have a second file that has many words per line.
I would like to go through each line in the first file, and all lines for which it is found in the second file, I would like to copy those lines from the second file into a new third file.
Is there a way to do this simply with Linux command?
Edit: Thanks for the input. But, I should specify better:
The first file is just a list of numbers (one number per line).
463463
43454
33634
The second file is very messy, and I am only looking for that number string to be in lines in any way (not necessary an individual word). So, for instance
ewjleji jejeti ciwlt 463463.52%
would return a hit. I think what was suggested to me does not work in this case (please forgive my having to edit for not being detailed enough)
If n is the number of lines in your first file and m is the number of lines in your second file, then you can solve this problem in O(nm) time in the following way:
cat firstfile | while read word; do
grep "$word" secondfile >>thirdfile
done
If you need to solve it more efficiently than that, I don't think there are any builtin utilties for that, however.
As for your edit, this method does work the way you describe.
Here is a short script that will do it. it will take 3 command line arguments 1- file with 1 word per line, 2- file with many lines you want to match for each word in file1 and 3- your output file:
#!/bin/bash
## test input and show usage on error
test -n "$1" && test -n "$2" && test -n "$3" || {
printf "Error: insufficient input, usage: %s file1 file2 file3\n" "${0//*\//}"
exit 1
}
while read line || test -n "$line" ; do
grep "$line" "$2" 1>>"$3" 2>/dev/null
done <"$1"
example:
$ cat words.txt
me
you
them
$ cat lines.txt
This line is for me
another line for me
maybe another for me
one for you
another for you
some for them
another for them
here is one that doesn't match any
$ bash ../lines.sh words.txt lines.txt outfile.txt
$ cat outfile.txt
This line is for me
another line for me
maybe another for me
some for them
one for you
another for you
some for them
another for them
(yes I know that me also matches some in the example file, but that's not really the point.

Resources