How to extract a pattern from a file and append to another on linux [duplicate] - linux

This question already has answers here:
Can grep show only words that match search pattern?
(15 answers)
Closed 6 months ago.
I've a txt file that contains a web page source code , i want to extract all links that contains "https://ANYTHING.amazonaws.com" in it to a new file.
The new file will contain:
https://test-ok.amazonaws.com
https://hhhhh.hhhh.amazonaws.com
https://anything.dd.dd.amazonaws.com
the links doesn't have to be in a specific tag or something, they can be anywhere in any tag!
Thanks!

You can use grep to search for a regex pattern with -o flag to print only the matching fragments and then redirect output to a new file.
In your case probably this one should work:
grep -o 'https://.*\.amazonaws\.com' sourcecode.html > newfile
Here you can find regular expression syntax cheatsheet.

Related

Bash shell script postprocessing results of ls [duplicate]

This question already has answers here:
What is the meaning of the ${0##...} syntax with variable, braces and hash character in bash?
(4 answers)
What does "##" in a shell script mean? [duplicate]
(1 answer)
Closed 7 months ago.
I came across a shell script like the following:
for FILE_PATH in `ls some/directory`
do
export FILE=${FILE_PATH##*/}
done
What exactly is the "##*/" doing? When I echo ${FILE} and ${FILE_PATH}, I don't see any difference. Is this to handle unusually named files?
More generally, how would I go about figuring out this type of question for myself in the future? Google was completely useless.
It's removing everything up to the last / in the value of $FILE. From the Bash Manual:
${parameter#word}
${parameter##word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches the beginning of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the ‘#’ case) or the longest matching pattern (the ‘##’ case) deleted.
You're not seeing any difference in this case because when you list a directory it just outputs the filenames, it doesn't include the directory portion, so there's nothing to remove. You would see the difference if you did:
for FILE in some/directory/*

Changing contents of a specific parameter in a file through shell script [duplicate]

This question already has answers here:
sed + replace only if match the first word in line
(3 answers)
Closed 1 year ago.
I have a file called local.conf:
db.default.driver="com.mysql.jdbc.Driver"
db.default.url="jdbc:mysql://localhost:3306/xyz31"
db.default.user="ron"
db.default.password=""
Here xyz31 is the variable ${DB_NAME}. I need to update only the ${DB_NAME} which is xyz31 (in this instance), but which varies depending on what the current of value ${DB_NAME} is, and which needs to be updated to whatever value the user has entered, for ex:abc22.
db.default.driver="com.mysql.jdbc.Driver"
db.default.url="jdbc:mysql://localhost:3306/abc22"
db.default.user="ron"
db.default.password=""
Is there a way to achieve this?
Use an editor like Unix' sed(1):
sed -i -e 's;xyz31;abc22; /your/funny/file/here
Might need to decorate with word beginning/end to avoid false positive matches (check your manual what is available). And/or write out another file and check with diff(1) that the change is right.
sed should be enough:
i=abc22 # or input from user by read i
sed "/^db.default.url/s/$DB_NAME/$i/" local.conf
Add -i option if you want to make change to the file.

Matching a pattern in cat/zcat? [duplicate]

This question already has answers here:
bash wildcard n digits
(4 answers)
Closed 2 years ago.
How can I match a more complex pattern when using cat or zcat? For example, I know I can do this:
zcat /var/log/nginx/access.log.*.gz
Which will output all the gzipped access logs to stdin.
What if I want a more complex pattern? Say, all the log files that are between 1-15, e.g. something like this pattern:
zcat /var/log/nginx/access.log.([1-9]|1[0-5]).gz
This results in an unexpected token which is obvious, but I'm not sure how I'd escape the regex in this situation? Maybe I need to pipe ls output to zcat instead?
It depends of course on specifically what pattern you want to match. For the example you have given of log files 1-15, you could use something like
cat /var/log/nginx/access.log.{1..15}.gz
which will complain to stderr if any of those numbers don't exist, but it will still concatenate the rest to stdout.
This technique is a "sequence expression" if you want to look it up - it's a part of brace expansion.

How to rename multiple files in a directory leaving the extension in Linux [duplicate]

This question already has answers here:
How to rename files without changing extension in Linux 102221.pdf to 102221_name.pdf
(3 answers)
Rename multiple files based on pattern in Unix
(24 answers)
Closed 4 years ago.
I need to rename files in a directory taking out a string of characters that is different with each file but starts the same way. I know how to strip characters from the filename, but how do I preserve the extension? I know it's a variation of a common question but I can't find a answer that fits my exact need.
Redshirts_ep6_dSBHpCsvQ3BfQ7-NNIjXYO4pnHpNMvu7bfvURLF3BSzB_3YOOrBBoNnICTR-hg.mp3
-> Redshirts.mp3
PathsNotTaken_ep6_XWixFER4PJyeozVfcxT96UajpnVI7cRMRhAU4Aj9-rpeacnBleuGY9zCPDe0aQ.mp3
-> PathsNotTaken.mp3
The linux command rename is super helpful here. It can use regex to perform the renaming.
This can probably rewritten a bit, but it appears do to the job here:
rename -n 's/(^[^_]*)_.*/$1.mp3/' *.mp3
Just remove that -n flag to run for for real. Leaving it on is just a test.
This regex says:
Characters at the start of the line ^ that don't contain an underscore [^_] repeated any number of times * are captured into a capture group (^[^_]*) if they are followed by an underscore and any number of any other characters _.*. These are then rewritten by using that first capture group $1 followed by .mp3

How to Format grep Output When Saving to a Variable [duplicate]

This question already has answers here:
How to preserve line breaks when storing command output to a variable?
(2 answers)
Closed 7 years ago.
If I grep our syslogs for a specific term, I get a nice output of those logs matching my term and each entry on a separate line.
If I save that to a variable so I can use it in a script as such:
results=$( grep "term" logs )
echo $results
then all the logs run together and are not human readable.
How can I make it look cleaner so when I do echo $results, I can actually read the output?
Thanks,
Quote it:
echo "$results"
This preserves all the whitespace, instead of using it for word splitting.
In general, you should almost always quote variables, unless you have a specific reason not to.

Resources