unix bash string search without using awk or sed [closed]

unix bash string search without using awk or sed [closed] - string

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Can't use awk or sed or any other string processing utilities
Any help on this problem is appreciated I have a text file with the following format
BobJanitor20000
TedBuilder30000
NedFighter25000
KitTeacher40000
yes, assume that the names are always three characters long, profession 7 characters and salary 5 characters edit: varaibles come from user input and not as a parameters
I ask the user for a name input and another input whether to display occupation or Salary
If the user enters "Ted" and chooses salary, the output should be
Ted 30000
The program must also take into account partial name matches, "ed" and salary should output
Ted 30000
Ned 25000
I know cut and grep can get me the relavent lines but how do i create the output i want?
cut -c1-3 textFile| grep "$user_input"
gets me the lines i want to use but how do i isolate the Name and profession columns and then the name and salary columns

You must split the input lines in fields first. Something like grep "$user_input" textFile will fail when the input matches a part of the job.
For this reason a simple approach with grep will fail:
With grep you have the option -o to show the matching part only. Combine this with a dot for a single character, and ^ for the beginning of a line or a $ for the end of the line.
# Show salary
# echo "TedBuilder30000" | grep "Ted" | grep -o ".....$"
# Show job
# echo "TedBuilder30000" | grep "Ted" | grep -o "............$" | grep -o "^......."
This will become messy when you want to show the matched name (Ted/Ned) as well.
So how do we split everything up?
I already stored the userinput for name and display in variables. The display is converted in lowercase automaticly with the typeset -l.
userinput=ed
typeset -l display
display=Salary
while read line; do
# offset 0, length 3
name=${line:0:3}
job=${line:3:7}
sal=${line:10:5}
# echo "Debug: Name=$name Job=$job Sal=$sal"
# double brackets for matching the userinput with wildcards
if [[ "$name" = *${userinput}* ]]; then
# Use case for a switch between different possible values
case "${display}" in
"occupation|job") echo "${name} ${job}";;
"salary") echo "${name} ${sal}";;
*) echo "Unsupported display ${display}";;
esac
fi
done < testFile # While reads from testFile, I avoid using cat (Google for uuoc)

You may want:
grep "$user_input" textFile | tee >(cut -c '11-15') | echo `cut -c 1-2`
This is called process substitution.
See https://unix.stackexchange.com/questions/28503/how-can-i-send-stdout-to-multiple-commands

You can use cut with character ranges, for example
echo NedFighter25000 | cut -c 1-3

Related

How can i search for an hexadecimal content in a file in a linux/unix/bash script? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have an hexadecimal string s and a file f, i need to search the first occurence of that string in the file and save that in a variable with his offset. I thought that the right way to do that is convert the file to hex and search that with a grep. The main problem is that i saw a lot of commands(hexdump,xxd,etc.) to convert but none of them actually work. Any suggestion?
My attempt was like this:
xxd -plain $f > $f
grep "$s" .
output should be like:
> offset:filename

A first approach without any error handling could look like
#!/bin/bash
BINFILE=$1
SEARCHSTRING=$2
HEXSTRING=$(xxd -p ${BINFILE} | tr -d "\n")
echo "${HEXSTRING}"
echo "Searching ${SEARCHSTRING}"
OFFSET=$(grep -aob ${SEARCHSTRING} <<< ${HEXSTRING} | cut -d ":" -f 1)
echo ${OFFSET}:${BINFILE}
I've used xxd here because of Does hexdump respect the endianness of its system?. Please take also note that according How to find a position of a character using grep? grep will return multiple matches, not only the first one. The offset will be counted beginning from 1, not 0. To substract 1 from the variable ${OFFSET} you may use $((${OFFSET}-1)).
I.e. search for the "string" ELF (HEX 454c46) in a system binary will look like
./searchHEX.sh /bin/yes 454c46
7f454c460201010000000000000000000...01000000000000000000000000000000
Searching 454c46
2:/bin/yes

I would use regex for this as well:
The text file:
$ cat tst.txt
1234567890x1fgg0x1cfffrr
A script you can easily change/extend yourself.
#! /bin/bash
part="$(perl -0pe 's/^((?:(?!0(x|X)[0-9a-fA-F]+).)*)(0(x|X)[0-9a-fA-F]+)(.|\n)*/\1:\3\n/g;' tst.txt)"
tmp=${part/:0x*/}
tmp=${#tmp}
echo ${part/*:0x/$tmp:0x} # Echoes 123456789:0x1f
Regex:
^((?:(?!0x[0-9a-fA-F]+).)*) = Search for the first entry that's a hexadecimal number and create a group of it (\1).
(0x[0-9a-fA-F]+) = Make a group of the hexadecimal number (\3).
(.|\n)* = Whatever follows.
Please note that tmp=${part/:0x*/} could cause problems if you have text like :0x before the hexadecimal number that is caught.

How to search in files and output discoveries only if they match both files [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to search for a string (which I don't know unless I look inside the files) on Linux command-line.
Example:
A file 1 with text inside
A file 2 with text inside
In both files the word "Apple" is existent.
I want to echo this word (which exists in both files) in a file or store it in a variable.
How is this possible?

You can get a list of all the unique words in a file using:
grep -o -E '\w+' filename | sort -u
where -E '\w+' is matching words and -o outputs the matching parts. We can then use the join command which identifies matching lines in two files, along with process substitution to pass in the results of our word finder:
join <(grep -o -E '\w+' filename1 | sort -u) <(grep -o -E '\w+' filename2 | sort -u)

If there are no duplicates whiten a single file you could use cat file1 file2 |sort | uniq -d

$ cat input_one.txt
FIREFOX a
FIREFOX b
Firefox a
firefox b
$ cat input_two.txt
CHROME a
FIREFOX a
EXPLORER a
$ while read line; do grep "$line" input_two.txt ; done < input_one.txt
FIREFOX a
Explanation:
while will loop every line with input_two.txt file as input and will store the temporary line in the line variable.
In every line will search for it in the input_one.txt file and -o option will make to print only the matched part.
EDIT: See comments

You can write the script to handle this.
You need loop a word on file 1, in the loop use grep command (grep -nwr "" -e "$word") to find a word in file 2.
If match, echo a word.

SED Command Replacement [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Suppose I have a file with warnings. Each warning in a new line with an id that has 3 capital letters followed by 3 digits only, should be replaced by its id.
Example:
SIM_WARNING[ANA397]: Node q<159> for vector output signal does not exist
The output should be ANA397 and the rest of line is deleted.
How to do so using sed?

I don't think you need sed for that. A simple grep with --only-matching could do, as in:
grep -E 'SIM_WARNING\[(.)\]' --only-matching
should work for you.
Where:
-E does "enhanced regular expressions. I think we need those for capturing with ( )
then follows the pattern, which consists of the fixed SIM_WARNING, followed by a match inside the square brackets
--only-matching simply makes grep print only matching content
In other words: by using ( match ) you tell grep that you only care about the thing in that match pattern.

for id in $(grep -o "^SIM_WARNING\[[A-Z][A-Z][A-Z][0-9][0-9][0-9]\]" test1.bla | grep -o "[A-Z][A-Z][A-Z][0-9][0-9][0-9]" test1.bla ); do echo $id; done
This finds ANA397 from the below.
SIM_WARNING[ANA397]: Node q<159> for vector output signal does not exist

First of all, you have to choose how to use the IDs, for example if you need to cycle the file first or the IDs later...
E.G. (Cycle file first)
exec 3<file
while read -r line <&3; do
id="$(printf "%s" "${line}" | sed -e "s/.*\[\([[:alnum:]]\+\)\].*/\1/")"
### Do something with id
done
exec 3>&-
Otherwise you can decide to cycle the output of sed...
E.G.
for id in $(sed -e "s/.*\[\([[:alnum:]]\+\)\].*/\1/" file); do
### Do something with id
done
Both of the examples should work with posix shell (If I am not missing something...), but shell like posh may not support classes as [[:alnum:]], you can substitute them with the equivalent [a-zA-Z0-9], as every guide will teach you.
Note that the check is not on 3 letters and 3 digits, but for any letter and digit between brackets ([ and ]).
EDIT:
If your lines start with SIM_WARNING you can discriminates those lines with -e "/^SIM_WARNING/! d"
For a strict check on 3 letters and 3 digits you can use -e "s/.*\[\([a-zA-Z][a-zA-Z][a-zA-Z][0-9][0-9][0-9]\)\].*/\1/"
So taking the example above you can do somethin like:
for id in $(sed -e "/^SIM_WARNING/! d" -e "s/.*\[\([a-zA-Z][a-zA-Z][a-zA-Z][0-9][0-9][0-9]\)\].*/\1/" file)
### Do something with id
done

How to avoid magic-numbers in shell?

I always write some magic numbers in my interactive shells and shell scripts.
For instance, If I want to list my users's names and shells, I'll write
cut --delimiter=: --fields=1,7 /etc/passwd
There exist two magic-numbers 1,7. And there are more and more magic-numbers in other circumstances.
Question
How to avoid magic-numbers in interactive shells and shell scripts?
Supplementary background
Our teacher told us using cut -d: -f1,7 /etc/passwd. But for new linux-users, they don't konw what's meaning of d,f,1,7.(not just for new linux-users，the whole system has so many configuration files that it is not easy for a person to remember every magic-numbers)
So, in interactive shells, we can use --delimiter, --fields,and the bash repl(or zsh,fish) has good tab completion to it.
How about the 1 and 7? In shell scripts, It's a good method to declare some const variables like LoginField=1 and ShellField=7 after reading the man 5 passwd. But when some one is writing in the interactive shells, it's not a good idea to open a new window and search the constants of LoginField=1,ShellField=7 and define it. how to using some thing like tab completion to simplify operations?

Use variables:
LoginField=1 ShellField=7
cut --delimiter=: --fields="$LoginField,$ShellField" /etc/passwd

Just like in other languages - by using variables. Example:
$ username_column=1
$ shell_column=7
$ cut --delimiter=: --fields="$username_column","$shell_column" /etc/passwd
The variables may be defined at the top of the script so that can be
easily modified or they can be set in an external config-like file
shared by multiple scripts.

The classic way to parse /etc/passwd is to read each column into an appropriately named variable:
while IFS=: read name passwd uid gid gecos home shell _; do
...
done < /etc/passwd

Use export:
export field_param="1,7"
(you can put it .bashrc file to have configured each time shell session is started). This export can be part of .sh script. It's a good practice to put them in the head/top of the file.
Then:
cut --delimiter=: --fields=$field_param /etc/passwd
This way you will need to edit the magic number in the only location.

Continuing from my comment, it's hard to tell exactly what you are asking. If you just want to give meaningful variable names, then do as shown in the other answers.
If however you want to be able to specify which fields are passed to cut from the command line, then you can use the positional parameters $1 and $2 to pass those values into your script.
You need to validate that two inputs are given and that both are integers. You can do that with a few simple tests, e.g.
#!/bin/bash
[ -n "$1" ] && [ -n "$2" ] || { ## validate 2 parameters given
printf "error: insufficient input\nusage: %s field1 field2\n" "${0##*/}"
exit 1
}
## validate both inputs are integer values
[ "$1" -eq "$1" >/dev/null 2>&1 ] || {
printf "error: field1 not integer value '%s'.\n" "$1"
exit 1
}
[ "$2" -eq "$2" >/dev/null 2>&1 ] || {
printf "error: field2 not integer value '%s'.\n" "$2"
exit 1
}
cut --delimiter=: --fields=$1,$2 /etc/passwd
Example Use/Output
$ bash fields.sh
error: insufficient input
usage: fields.sh field1 field2
$ bash fields.sh 1 d
error: field2 not integer value 'd'.
$ bash fields.sh 1 7
root:/bin/bash
bin:/usr/bin/nologin
daemon:/usr/bin/nologin
mail:/usr/bin/nologin
ftp:/usr/bin/nologin
http:/usr/bin/nologin
uuidd:/usr/bin/nologin
dbus:/usr/bin/nologin
nobody:/usr/bin/nologin
systemd-journal-gateway:/usr/bin/nologin
systemd-timesync:/usr/bin/nologin
systemd-network:/usr/bin/nologin
systemd-bus-proxy:/usr/bin/nologin
<snip>
Or if you choose to look at fields 1 and 3, then all you need do is pass those as the parameters, e.g.
$ bash fields.sh 1 3
root:0
bin:1
daemon:2
mail:8
ftp:14
http:33
uuidd:68
dbus:81
nobody:99
systemd-journal-gateway:191
systemd-timesync:192
systemd-network:193
systemd-bus-proxy:194
<snip>
Look things over and let me know if you have further questions.

Scraping the output of man 5 passwd for human-readable header names:
declare $(man 5 passwd |
sed -n '/^\s*·\s*/{s/^\s*·\s*//;y/ /_/;p}' |
sed -n 'p;=' | paste -d= - - )
See "how it works" below for what that does, then run:
cut --delimiter=: \
--fields=${login_name},${optional_user_command_interpreter} /etc/passwd
Which outputs the specified /etc/passwd fields.
How it works.
The man page describing /etc/passwd contains a bullet list of header names. Use GNU sed to find the bullets (·) and leading whitespace, then remove the bullets and whitespace, replace the remaining spaces with underlines; a 2nd instance of sed provides fresh line numbers, then paste the header names to the line numbers, with a = between:
man 5 passwd |
sed -n '/^\s*·\s*/{s/^\s*·\s*//;y/ /_/;p}' |
sed -n 'p;=' | paste -d= - -
Outputs:
login_name=1
optional_encrypted_password=2
numerical_user_ID=3
numerical_group_ID=4
user_name_or_comment_field=5
user_home_directory=6
optional_user_command_interpreter=7
And declare makes those active in the current shell.

What is cat for and what is it doing here? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have this script I'm studying and I would like to know what is cat doing in this section.
if cat downloaded.txt | grep "$count" >/dev/null
then
echo "File already downloaded!"
else
echo $count >> downloaded.txt
cat $count | egrep -o "http://server.*(png|jpg|gif)" | nice -n -20 wget --no-dns-cache -4 --tries=2 --keep-session-cookies --load-cookies=cookies.txt --referer=http://server.com/wallpaper/$number -i -
rm $count
fi

Like most cats, this is a useless cat.
Instead of:
if cat downloaded.txt | grep "$count" >/dev/null
It could have been written:
if grep "$count" download.txt > /dev/null
In fact, because you've eliminated the pipe, you've eliminated issues with which exit value the if statement is dealing with.
Most Unix cats you'll see are of the useless variety. However, people like cats almost as much as they like using a grep/awk pipe, or using multiple grep or sed commands instead of combining everything into a single command.
The cat command stands for concatenate which is to allow you to concatenate files. It was created to be used with the split command which splits a file into multiple parts. This was useful if you had a really big file, but had to put it on floppy drives that couldn't hold the entire file:
split -b140K -a4 my_really_big_file.txt my_smaller_files.txt.
Now, I'll have my_smaller_files.txt.aaaa and my_smaller_files.txt.aaab and so forth. I can put them on the floppies, and then on the other computer. (Heck, I might go all high tech and use UUCP on you!).
Once I get my files on the other computer, I can do this:
cat my_smaller_files.txt.* > my_really_big_file.txt
And, that's one cat that isn't useless.

cat prints out the contents of the file with the given name (to the standard output or to wherever it's redirected). The result can be piped to some other command (in this case, (e)grep to find something in the file contents). Concretely, here it tries to download the images referenced in that file, then adds the name of the file to downloaded.txt in order to not process it again (this is what the check in if was about).
http://www.linfo.org/cat.html

"cat" is a unix command that reads the contents of one or more files sequentially and by default prints out the information the user console ("stdout" or standard output).
In this case cat is being used to read the contents of the file "downloaded.txt", the pipe "|" is redirecting/feeding its output to the grep program, which is searching for whatever is in the variable "$count" to be matched with.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

unix bash string search without using awk or sed [closed] - string

You may want: grep "$user_input" textFile | tee >(cut -c '11-15') | echo `cut -c 1-2` This is called process substitution. See https://unix.stackexchange.com/questions/28503/how-can-i-send-stdout-to-multiple-commands

You can use cut with character ranges, for example echo NedFighter25000 | cut -c 1-3

Related

How can i search for an hexadecimal content in a file in a linux/unix/bash script? [closed]

How to search in files and output discoveries only if they match both files [closed]

SED Command Replacement [closed]

How to avoid magic-numbers in shell?

What is cat for and what is it doing here? [closed]

Categories

Resources