Grep: Copy a link with specific text - linux

I have a text file with many links which aren't in separate lines.
I want to save in another file probably, all the links which contains a specific word.
How can I do this with grep?
EDIT:
To become more specifique, I have a messy txt file with many links. I want to copy in onother file all links starting with https:://, ending with .jpg and contains anywhere "10x10" string for example

You can get all the lines containing a specific word from the file like this:
LINKS=$(cat myfile.txt | grep MYWORD)
Then with LINKS, you can use a delimiter to create an array of links, which you can print to another file.
# Using a space as the delimeter
while IFS=' 'read -ra ind_link
do
echo $ind_link >> mynewfile.txt
done <<< "$LINKS"
Something along those lines I think is what you are looking for no?
Also if you need to refine your search, you can use the grep options such as -w to get more specific.
Hope it helps.

Could you give us the specific word and an example of input file ?
You could try to use egrep or/and sed like this (for example) :
egrep -o "href=\".*\.html\"" file|sed "s/\"\([^\"]*\)/\1/g"
Another exemple for all kind of http/https ressources links (whithout spaces in the URL) :
$ echo "<a href=http://titi/toto.jpg >"|egrep -o "https?:\/\/[^\ ]*"
http://titi/toto.jpg
$ echo "<a href=https://titi/toto.htm >"|egrep -o "https?:\/\/[^\ ]*"
https://titi/toto.htm
You have to customize the regexp according to your needs.

Related

How to use --file=script-file option to input a file to search and replace in sed command

I am doing a jenkins migration using jenkins-cli where in one step I am using sed command to replacing values manually as like below :
sed 's/mukesh/architect/g' target_file
But I would like to enter all the possible values in Input file with two column with delimeter as = and supply to target file
Input file looks like
ex:
mukesh=architect
abdul=manager
Now I want to use this file as input in sed command for search and replace in my target file. Instead of using s///g manually, I want to use the below option that I found in man page
-f script-file, --file=script-file
But not sure how to use this input file to auto search and replace the pattern in to the target file. It would be grateful if I get any samples, examples.
You can use below code to read input file, parse it and update outfile.
Here I am reading input file, separating values based on delimeter "=" and then updating outfile/target file.
while read name
do
x=`echo $name|cut -d"=" -f1`
y=`echo $name|cut -d"=" -f2`
sed -i "s/$x/$y/g" outfile
done < inputfile
This should solve your problem. Let me know if you are looking for something else or extra.cheers :)

How to pull down a list of domains using wget and scan them using grep

I have a list of domain names contained within a folder named "domains.txt", formatted like this:
www.google.com
www.stackoverflow.com
www.apple.com
etc...
I want to perform a wget command to pull down a copy of each domain listed inside "domains.txt" and save it as a .html page.
I can do this individually using wget www.google.com but I'm wondering, instead of doing each one separately, can I iterate through the list and save each domain name as a separate .html file?
The second action I want to perform is a scan of these pulled down .html files for keywords, which I have contained in a text file named "keywords.txt". They're formatted like this:
first_keyword
second_keyword
third_keyword
etc...
Ideally, I'd like to have an output that prints the domain name to a text file, with a "yes" beside it if it has been found to contain any of the keywords contained in "keywords.txt". If it's possible to print what keywords were found beside each domain that would be brilliant, but a simple "yes" would be great too. I'm brand new to Linux and scripting, so any help would be greatly appreciated!
I assume the files don't contain the quotes. Otherwise I would need more code to remove the quotes.
domains.txt
www.google.com
www.stackoverflow.com
www.apple.com
keywords.txt
first_keyword
second_keyword
third_keyword
You can try something like this
outfile=tmp.html
while IFS= read -r domain
do
wget -O "$outfile" "$domain"
if fgrep -q -f keywords.txt "$outfile"
then
echo "$domain" yes
else
echo "$domain" no
fi
rm "$outfile"
done < domains.txt

grep from a input file, multiple lines while the input file has ^name

I would really appreciate some help with this:
I have a huge file, I will give you an example of how it is formatted:
name:lastname:email
I have a input file with lots of names set out like this example:
edward
michael
jenny
I want to match to name column from the huge file to the name in the input file, and only if it is an exact match (case insensitive)
Once it finds a match I want it to output a .txt with all of the matchs
I think I can use a command something like ^Michael: to give it.
Can anyone help me with this grep problem?
sorry if I am not too clear its very late and I have been on this problem for ages
"Centos 5, "grep -i -E -f file.txt /root/dir2search >out.txt"
file.txt containing
^michael:
^bobert:
^billy:
Doesn't find anything.
grep -i -E -f inputfile namesfile > outputfile will do what you want, if your input file consists of one input name per line, in the pattern you already suggested:
^Michael:
^Jane:
^Tom:
-i: case-insensitive matching
-E: regexp pattern matching (often the default, but I don't know how your environment is set up)
-f: read patterns from a file, one pattern per line
>: redirect the output to a file
To get the existing input file you described (space-separated names) into the new format, you could use:
sed -r 's/([^ ]+)[ $]?/^\1:\n/g;s/\n$//g' inputfile > newinputfile

How to use grep in a shell script to find a word inside of a file

How can I use grep to find an exact word inside of a file entered by the user as string?
For example I need to select the word I want to find and the file I want to find it in. I've been told I am really close but something is not working as it should be. I'm using bash shell under Linux.
Here's what I've done so far:
#!/bin/bash
echo "Find the file you want to search the word in?"
read filename
echo "Enter the word you want to find."
read word1
grep $word1 $filename
How can I use grep to find an exact word inside of a file entered by
the user as string.
Try using -F option. Type man grep on shell for more details.
grep -F "$word1" "$filename"
It's recommended to enclose search string and file name with quotes to avoid unexpected output because of white spaces.
Not sure why you have fi on the last line. It's not needed.
Try:
grep -R WORD ./ to search the entire current directory, or grep WORD ./path/to/file.ext to search inside a specific file.
#!/bin/bash/
echo "Find the file you want to search the word in?"
read filename
echo "Enter the word you want to find."
read word
cat $filename | grep "$word"
This works fine to find the exact word match in a file.
If you want an exact word match within lines use
grep "\b$word\b"

Extract Directory from Log File with sed

I'm trying to parse through an application.log that has many lines that follow the same syntax below.
"Error","jrpp-237","10/13/11","02:55:04",,"File not found: /indexUsa~.cfm The specific sequence of files included or processed is: c:\websites\pj7fe4\indexUsa~.cfm '' "
I need to use some type of command to pull out what is listed between c:\websites\ and the next \
e.g. in this case it would be pj7fe4
I thought that the following command would work..
bin/sed -n '/c:\\websites\\/,/\\/p' upload/test.log
Unfortunately from reading further I now understand that this will return the entire line containing c:\websites through the \ and I need to know the in between, not the whole line.
To be more difficult I need to match all of the directory sub paths, not just one particular line as this is for multiple sites.
You're using range patterns incorrectly. You can't use it to limit the command (print in this case) to a part of the line, only to a range of lines. You also don't escape the backspaces.
Try this: sed 's/.*c:\\websites\\\([0-9a-zA-Z]*\)\\.*/\1/'
There's a good sed tutorial here: Sed - An Introduction and Tutorial by Bruce Barnett
grep way:
grep -Po "(?<=c:\\\websites\\\)[^\\\]+(?=\\\)" yourFile
test:
kent$ echo '"Error","jrpp-237","10/13/11","02:55:04",,"File not found: /indexUsa~.cfm The specific sequence of files included or processed is: c:\websites\pj7fe4\indexUsa~.cfm '' "'|grep -Po "(?<=c:\\\websites\\\)[^\\\]+(?=\\\)"
pj7fe4

Resources