how to download batch of data with linux command line?

how to download batch of data with linux command line? - linux

For example I want to download data from:
http://nimbus.cos.uidaho.edu/DATA/OBS/
with the link:
http://nimbus.cos.uidaho.edu/DATA/OBS/pr_1979.nc
to
http://nimbus.cos.uidaho.edu/DATA/OBS/pr_2015.nc
How can I write a script to download all of them? with wget?and how to loop the links from 1979 to 2015?

wget can take file as input which contains URLs per line.
wget -ci url_file
-i : input file
-c : resume functionality
So all you need to do is put the URLs in a file and use that file with wget.
A simple loop like Jeff Puckett II's answer will be sufficient for your particular case, but if you happen to deal with more complex situations (random urls), this method may come in handy.

Probably something like a for loop iterating over a predefined series.
Untested code:
for i in {1979..2015}; do
wget http://nimbus.cos.uidaho.edu/DATA/OBS/pr_$i.nc
done

Related

wget extract text and save to file

So I have several json files I need to go through and extract email addresses from. They are formatted like this. %40 being the # symbol.
"email":"google%40gmail.com"
I am using wget to grab all my files, but they have much more content inside that I do not need right now. What would be the best way to modify this script below to just grab the email like above?
for i in $(seq 0 1000); do wget "http://example./com/users.php?info=user/user.json&user_id=${i}" --output-document="${i}.txt"; done

What command to search for ID in .bz2 file?

I am new to Linux and I'm trying to look for an ID number within a .bz2 file. Seems like a fairly straight forward requirement, however I cannot find the correct command anywhere online. I believe I need to use bzgrep.
I want to look for '123456' in the file Bulk9876.bz2
How would I construct this command?

You probably just need to tell grep that it's okay to parse that data as text:
bzgrep -a 123456 Bulk9876.bz2
If you're trying to view the compressed data (rather than decompressing it and searching the decompressed data), just use grep -a ….
Otherwise, it might make sense to verify that the desired string is even present in the file; bunzip2 it and grep -a the decompressed file. If that works, the problem is in your bzgrep instance (which is odd because it should be using the same decompression library as bunzip2).

Using a Chef recipe to append multiple lines to a config file

I'm trying to create a Chef recipe to append multiple lines (20-30) to a specific config file.
I'm aware the recommended pattern is to change entire config files rather than just appending to a file, but I dislike this approach for multiple reasons.
So far the only solution I found was to use a cookbook_file and then use a bash resource to do:
cat lines_to_append >> /path/configfile
Obviously this wouldn't work properly, as it'd append the file over and over, each time you run chef-client. I'd have to create a small bash script to check for a specific string first, and, if not found, append to the file.
But this seems to defeat the purpose of using Chef. There must be a better way.
One promising solution was the line cookbook from OpsCode Community. It aimed to solve this exact problem. Unfortunately the functionality is incomplete, buggy, and the code is just a quick hack. Far from being a solid solution.
Another option I evaluated was augeas. Seems pretty powerful, but it'd add yet-another layer of abstraction to the system. Overkill, in my case.
Given that this is one of the most obvious tasks for any sysadmin, is there any easy and beautiful solution with Chef that I'm not seeing?
EDIT: here's how I'm solving it so far:
cookbook_file "/tmp/parms_to_append.conf" do
source "parms_to_append.conf"
end
bash "append_to_config" do
user "root"
code <<-EOF
cat /tmp/parms_to_append.conf >> /etc/config
rm /tmp/parms_to_append.conf
EOF
not_if "grep -q MY_IDENTIFIER /etc/config"
end
It works, but not sure this is the recommended Chef pattern.

As you said yourself, the recommended Chef pattern is to manage the whole file.
If you're using Chef 11 you could probably make use of partials for what you're trying to achieve.
There's more info here and on this example cookbook.
As long as you have access to the original config template, just append <%= render "original_config.erb" %> to the top of your parms_to_append.conf template.

As said before, using templates and partials is common way of doing this, but chef allows appending files, and even changing(editing) file lines. Appendind is performed using following functions:
insert_line_after_match(regex, newline);
insert_line_if_no_match(regex, newline)
You may find and example here on stackoverflow, and the full documentation on rubydoc.info
Please use it with caution, and only when partials and templates are not appropriate.

I did something like this:
monit_overwrites/templates/default/monitrc.erb:
#---FLOWDOCK-START
set mail-format { from: monit#ourservice.com }
#---FLOWDOCK-END
In my recipe I did this:
monit_overwrites/recipes/default.rb:
execute "Clean up monitrc from earlier runs" do
user "root"
command "sed '/#---FLOWDOCK-START/,/#---FLOWDOCK-END/d' > /etc/monitrc"
end
template "/tmp/monitrc_append.conf" do
source "monitrc_append.erb"
end
execute "Setup monit to push notifications into flowdock" do
user "root"
command "cat /tmp/monitrc_append.conf >> /etc/monitrc"
end
execute "Remove monitrc_append" do
command "rm /tmp/monitrc_append.conf"
end

The easiest way to tackle this would be to create a string and pass it to content. Of course bash blocks work... but I think file resources are elegant.
lines = ""
File.open('input file') do |f|
f.lines.each do |line|
lines = lines + line + "\n"
end
end
file "file path" do
content line
end

Here is the example ruby block for inserting 2 new lines after match:
ruby_block "insert_lines" do
block do
file = Chef::Util::FileEdit.new("/etc/nginx/nginx.conf")
file.insert_line_after_match("worker_rlimit_nofile", "load_module 1")
file.insert_line_after_match("pid", "load_module 2")
file.write_file
end
end
insert_line_after_match searches for the regex/string and it will insert the value in after the match.

Passing data into perl script from command line

I have a perl script the creates a report based on an xml definition. Currently these definitions all exist as .xml files.
So I have the script run-report.pl, which can take a path to a definition file and create the report.
Now I want to create run-reports-from-db.pl, which will generate the report definition based on same database entries. I don't want to create temp files to pass to run-report.pl, I would just like to pass in the definition somehow.
So instead of saying:
run-report.pl -def=./path/to/def.xml
I want to be able to say:
run-report.pl --stream
And have the report definition available in <STDIN>
I am sure there is pretty trivial way to do this???

If I understand your question correctly, all you need is one | (pipe).
./generate-xml-from-db.pl | ./run-report.pl --stream
Anything the first process in the pipeline prints to stdout will appear in the second process's stdin.

As long as you read from STDIN, you have it available. Notice what happens with you take the code below name it something like echo.pl run it at the command line and paste reams of text.
#!/usr/bin/perl -w
use 5.010;
use strict;
use warnings;
while ( <> ) {
say;
}
<> is the Perl shorthand for "read from STDIN".
As long as the method you're using to launch the process has a way to get a hold of the standard input and outputs, you can just write it to that handle. You have to use the ways that are available to you. In Java, for example, you'd have to get the input stream of the process, in a batch command you have to pipe it. At a GUI terminal you can cut and paste.

Compare two websites and see if they are "equal?"

We are migrating web servers, and it would be nice to have an automated way to check some of the basic site structure to see if the rendered pages are the same on the new server as the old server. I was just wondering if anyone knew of anything to assist in this task?

Get the formatted output of both sites (here we use w3m, but lynx can also work):
w3m -dump http://google.com 2>/dev/null > /tmp/1.html
w3m -dump http://google.de 2>/dev/null > /tmp/2.html
Then use wdiff, it can give you a percentage of how similar the two texts are.
wdiff -nis /tmp/1.html /tmp/2.html
It can be also easier to see the differences using colordiff.
wdiff -nis /tmp/1.html /tmp/2.html | colordiff
Excerpt of output:
Web Images Vidéos Maps [-Actualités-] Livres {+Traduction+} Gmail plus »
[-iGoogle |-]
Paramètres | Connexion
Google [hp1] [hp2]
[hp3] [-Français-] {+Deutschland+}
[ ] Recherche
avancéeOutils
[Recherche Google][J'ai de la chance] linguistiques
/tmp/1.html: 43 words 39 90% common 3 6% deleted 1 2% changed
/tmp/2.html: 49 words 39 79% common 9 18% inserted 1 2% changed
(he actually put google.com into french... funny)
The common % values are how similar both texts are. Plus you can easily see the differences by word (instead of by line which can be a clutter).

The catch is how to check the 'rendered' pages. If the pages don't have any dynamic content the easiest way to do that is to generate hashes for the files using a md5 or sha1 commands and check then against the new server.
IF the pages have dynamic content you will have to download the site using a tool like wget
wget --mirror http://thewebsite/thepages
and then use diff as suggested by Warner or do the hash thing again. I think diff may be the best way to go since even a change of 1 character will mess up the hash.

I've created the following PHP code that does what Weboide suggest here. Thanks Weboide!
the paste is here:
http://pastebin.com/0V7sVNEq

Using the open source tool recheck-web (https://github.com/retest/recheck-web), there are two possibilities:
Create a Selenium test that checks all of your URLs on the old server, creating Golden Masters. Then running that test on the new server and find how they differ.
Use the free and open source (https://github.com/retest/recheck-web-chrome-extension) Chrome extension, that internally uses recheck-web to do the same: https://chrome.google.com/webstore/detail/recheck-web-demo/ifbcdobnjihilgldbjeomakdaejhplii
For both solutions you currently need to manually list all relevant URLs. In most situations, this shouldn't be a big problem. recheck-web will compare the rendered website and show you exactly where they differ (i.e. different font, different meta tags, even different link URLs). And it gives you powerful filters to let you focus on what is relevant to you.
Disclaimer: I have helped create recheck-web.

Copy the files to the same server in /tmp/directory1 and /tmp/directory2 and run the following command:
diff -r /tmp/directory1 /tmp/directory2
For all intents and purposes, you can put them in your preferred location with your preferred naming convention.
Edit 1
You could potentially use lynx -dump or a wget and run a diff on the results.

Short of rendering each page, taking screen captures, and comparing those screenshots, I don't think it's possible to compare the rendered pages.
However, it is certainly possible to compare the downloaded website after downloading recursively with wget.
wget [option]... [URL]...
-m
--mirror
Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP
directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
The next step would then be to do the recursive diff that Warner recommended.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how to download batch of data with linux command line? - linux

Probably something like a for loop iterating over a predefined series. Untested code: for i in {1979..2015}; do wget http://nimbus.cos.uidaho.edu/DATA/OBS/pr_$i.nc done

Related

wget extract text and save to file

What command to search for ID in .bz2 file?

Using a Chef recipe to append multiple lines to a config file

Passing data into perl script from command line

Compare two websites and see if they are "equal?"

Categories

Resources