wget extract text and save to file - linux

So I have several json files I need to go through and extract email addresses from. They are formatted like this. %40 being the # symbol.
"email":"google%40gmail.com"
I am using wget to grab all my files, but they have much more content inside that I do not need right now. What would be the best way to modify this script below to just grab the email like above?
for i in $(seq 0 1000); do wget "http://example./com/users.php?info=user/user.json&user_id=${i}" --output-document="${i}.txt"; done

Related

Multiple condition in Bash for loop to get unique output

hope everyone fine. I am stacked badly so need your help
I am using a for loop to collect all folder names
for tamplate in /root/tool/nuclei-templates/*/
do
echo $tamplate
done
Output
/root/tool/nuclei-templates/brute-force/
/root/tool/nuclei-templates/cves/
/root/tool/nuclei-templates/dns/
/root/tool/nuclei-templates/files/
/root/tool/nuclei-templates/generic-detections/
/root/tool/nuclei-templates/panels/
/root/tool/nuclei-templates/payloads/
/root/tool/nuclei-templates/security-misconfiguration/
/root/tool/nuclei-templates/subdomain-takeover/
/root/tool/nuclei-templates/technologies/
/root/tool/nuclei-templates/tokens/
/root/tool/nuclei-templates/vulnerabilities/
/root/tool/nuclei-templates/workflows/
I am using this output on a tool that need those folder path. That tool also give output like this
nuclei -l url.txt -t /root/tool/nuclei-templates/brute-force/ -o result.brute-force
Output
result.brute-force
But as i am using a for loop to automate this scan part i also need to generate unique output for each result.
I am expecting a output like this
result.brute-force
reesult.cves
result.dns
result.files
Generally for each tamplate load with for loop it should generate a output with the name of that specific tamplate folder.
If everything work well this should give me 13 unique result with that output pattern i mentioned.

How do I modify/subset a wget script to specify a date range to only download certain years into different scripts?

I am trying to download a lot of data for some research from the CMIP6 website (https://esgf-node.llnl.gov/search/cmip6/) that provides wget scripts for each model.
The scripts are for every 6 hours or month from 1850 to 2014. The date format looks like this (1st script): 185001010600-185101010000 or (for 2nd script) 195001010600-195002010000, 195002010600-195003010000
My goal is to turn one giant script into several smaller ones with five years each for 1980 to 2015
As an example, I would want to subset the main script into different scripts with 5 year intervals ("19800101-19841231" then "19850101-19901231", etc.) with each named wget-1980_1985.sh, wget-1985_1990.sh, respectively
For an example date range for the 2nd script, I would need:
197912010600 through 198601010000, then every 5 years after that
I'm a beginner so please help if you can!
Part of the wget script format for each file looks like this (it won't let me copy and paste the whole thing since there are too many links [see below to find the file yourself]):
1.) #These are the embedded files to be downloaded download_files="$(cat <185001010600-185101010000.nc' 'http://esgf-data2.diasjp.net/thredds/fileServer/esg_dataroot/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/6hrPlevPt/hus/gn/v20191204/hus_6hrPlevPt_MIROC6_historical_r1i1p1f1_gn_185001010600-185101010000.nc' 'SHA256'
'fa9ac4149cc700876cb10c4e681173bcc0040ea03b9a439d1c66ef47b0253c5a'
'hus_6hrPlevPt_MIROC6_historical_r1i1p1f1_gn_185101010600-185201010000.nc' 'http://esgf-data2.diasjp.net/thredds/fileServer/esg_dataroot/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/6hrPlevPt/hus/gn/v20191204/hus_6hrPlevPt_MIROC6_historical_r1i1p1f1_gn_185101010600-185201010000.nc' 'SHA256'
'4ef4f99aa34aae6dfdafaa4aab206344125abe7808df675d688890825db53047'
2.) For the second script, the dates look like this: 'ps_6hrLev_MIROC6_historical_r1i1p1f1_gn_195001010600-195002010000.nc'
To run it, you just download the script from the website (see below)
or downloading from this link should work:
1.) https://esgf-node.llnl.gov/esg-search/wget/?distrib=false&dataset_id=CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.6hrPlevPt.hus.gn.v20191204|esgf-data2.diasjp.net
2.) A similar script can be seen here (the dates are different but I need this one too):
https://esgf-node.llnl.gov/esg-search/wget/?distrib=false&dataset_id=CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.6hrLev.ps.gn.v20191114|esgf-data2.diasjp.net
to run the script in the terminal, this is the command i use
bash wget* -H
and it will download each file.
I can vi the script and delete each file (by using "dd") I don't need but this will be extremely time consuming.
To find this data and get the wget script from the website, go to: https://esgf-node.llnl.gov/search/cmip6/
and select the variables on the left side of the page as follows:
Source ID: MIROC6,
Experiment ID: Historical,
Variant Label: r1i1p1f1,
Table ID: 6hrPlevPt,
and Variable: hus
it will look like this
*If these files are too big, you can also select Frequency:monthly instead for a much smaller file. I just want you to see the date format since monthly is just the month and year
Then hit search and it will give you one model to download. it will look like thisOn the bottom, with the links, it will say "wget script." Click that and it will download.
You can
vi wget*
to view and/or edit it or
bash wget* -H
to run/download each file.
It might ask you to log in but I've found typing in nonsense to the username and password still starts the download.
Please help! This will be the next 6 months of my life and I really don't want to "dd" every file I don't need for all of these!
A bash for loop can generate relevant date ranges and output filename.
A simple sed script can delete relevant lines if they appear in order.
For example:
#!/bin/bash
in=esgf_script
for y in $(seq 1979 5 2014); do
out="wget_{$y}-$((y+4)).sh"
sed '/_gn_/{ # if some kind of url:
/_gn_'$((y+5))'/,$ d; # delete if year >= y+5
/_gn_2015/,$ d; # delete if year >= 2015
/_gn_'$y'/,$ !d; # delete if year < y
}' <"$in" >"$out"
done
The seq command generates every fifth year starting from 1979 up to 2014.
The sed script:
looks for lines containing urls: /_gn_/
deletes if year is too big
otherwise, doesn't delete if year is big enough
This code assumes that:
no lines except urls contain the first regex (/_gn_/)
the urls appear in ascending year order (eg. urls containing 1994 cannot appear before ones containing 1993)

What command to search for ID in .bz2 file?

I am new to Linux and I'm trying to look for an ID number within a .bz2 file. Seems like a fairly straight forward requirement, however I cannot find the correct command anywhere online. I believe I need to use bzgrep.
I want to look for '123456' in the file Bulk9876.bz2
How would I construct this command?
You probably just need to tell grep that it's okay to parse that data as text:
bzgrep -a 123456 Bulk9876.bz2
If you're trying to view the compressed data (rather than decompressing it and searching the decompressed data), just use grep -a ….
Otherwise, it might make sense to verify that the desired string is even present in the file; bunzip2 it and grep -a the decompressed file. If that works, the problem is in your bzgrep instance (which is odd because it should be using the same decompression library as bunzip2).

Searching in multiple files using findstr, only proceeding with the resulting files? (cmd)

I'm currently working on a project where I search hundreds of files using findstr in the command line. If I find the string which I searched for, I want to proceed with this exact file (and the other ones that include my string).
So in my case:
I searched for the string WRI2016 by using:
H:\KOBINI>findstr "WRI2016" *.ini > %temp%\xx.txt && %temp%\xx.txt
To see what the PC does, I save it in a .txt file as you can see.
So if my file includes WRI2016 I want to extract some facts out of the file. In my case it is NR, Kunde, WebHDAktiv, DigIDAktiv.
But I just can't find a proper way to link both of these functions.
At first I simply printed all of the parameters:
H:\KOBINI>findstr "\<NR Kunde WRI2016 WebHDAktiv DigIDAktiv" *.ini > %temp%\xx.csv && %temp%\xx.csv
I also played around using the if command but that didn't really work out. I'm pretty new to this stuff as you'll see in my following tries to solve this problem:
H:\KOBINI>findstr "\<NR DigIDAktiv WebHDAktiv" set a =*.ini findstr "WRI2016" set b =*.ini if a EQU b > %temp%\xx.txt && %temp%\xx.txt
So all I wanted to achieve with that weird code was: if there is a WRI2016 in the file, give me the remaining parameters. But that didn't work out at all.
I also tried it with using new lines for every command which didn't change a thing.
As I want this to be a .csv in the end I want to add a semicolon between my parameters, any chance how I could do that? I've seen versions using -s";" which didn't do anything for me.
Sorry, I'm quite new and thought I'd give it a shot.
an example of my .ini files Looks like this:
> Kunde=Markt
> Nr=101381
> [...]
> DigIDAktiv=Ja
> WebHDAktiv=Nein
> Version=WRI2016_U2_P1
some files have a different Version though.
So I only want to know "NR, DigIDAktiv ..." if it's the 2016 Version.
As a result it should be sorted in a CSV, in different columns.
My Folder Looks like this
So I search These files in order to find Version 2016 and then try to extract my Information and put it into a .csv

how to download batch of data with linux command line?

For example I want to download data from:
http://nimbus.cos.uidaho.edu/DATA/OBS/
with the link:
http://nimbus.cos.uidaho.edu/DATA/OBS/pr_1979.nc
to
http://nimbus.cos.uidaho.edu/DATA/OBS/pr_2015.nc
How can I write a script to download all of them? with wget?and how to loop the links from 1979 to 2015?
wget can take file as input which contains URLs per line.
wget -ci url_file
-i : input file
-c : resume functionality
So all you need to do is put the URLs in a file and use that file with wget.
A simple loop like Jeff Puckett II's answer will be sufficient for your particular case, but if you happen to deal with more complex situations (random urls), this method may come in handy.
Probably something like a for loop iterating over a predefined series.
Untested code:
for i in {1979..2015}; do
wget http://nimbus.cos.uidaho.edu/DATA/OBS/pr_$i.nc
done

Resources