How to copy web page content in a file via linux terminal? - linux

I want to copy some texts from a web page to a file in Linux. I know "wget" can be used to download files but my favourite data is not stored in files and when I want to have them, I have to use copy and paste manually which is very difficult for thousands of web pages.
For example, I need to have the data in the link below:
http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=2017&MONTH=09&FROM=0112&TO=0112&STNM=72672
and similar link with varying YEAR, MONTH, FROM, TO, STNM values.
Is there any command/script to copy and paste automatically ?

First, make a file with all of the year, month, from, to and stnm. A line for each one:
inputFile.txt:
2017,09,0112,0112,72672
2017,08,0112,0112,72672
In a shell script,loop through that file line by line and execute wget replacing your hardcoded values with variables filled from the read line:
#!/bin/bash
while IFS=, read -r year month from to stnm; do
wget "http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=$year&MONTH=$month&FROM=$from&TO=$to&STNM=$stnm"
done < inputFile.txt
That's the bare bones version, I'm certain it could use some tweaks to get up and running, but it should be close.
Execute the shell script:
bash whateveryounamedthisscript.sh
In this example two new files will be generated, one for September and another for August.

Related

vim ~/.ssh/config command isn't showing any results

I'm using windows 10 ,I want to get into the SSH file but whenever I type "vim ~/.ssh/config" into my Git Bash the result I get is this
but the result I'm trying to get is supposed to look like the ones are showing in this second picture:
Can someone tell me how to get those results in my command ?
If the second picture is a file you already have on your disk, you need to copy it to your %USERPROFILE%\.ssh folder (since %USERPROFILE% is what git bash uses by default as $HOME or ~)
If the second picture is a file you want to have, then you need to create it.

How to stream log files content that is constantly changing file names in perl?

I a series of applications on Linux systems that I need to basically constantly 'stream' out or even just 'tail' out but the challenge is the filenames are constantly rolling and changing.
The are all date encoded (dates being in different formats) and each then have different incremented formats.
Most of them start with one and increase, but one doesn't have an extension and then adds an extension past the first file and the other increments a number but once hitting 99 rolls to increment a alpha and returns the numeric to 01 and then up again as it rolls so quickly.
I just have the OS level shell scripting, OS command line utilities, and perl available to me to handle this situation for another application to pickup and read these logs.
The new files are always created right when it starts writing to the new file and groups of different logs (some I am reading some I am not) are being written to the same directory so I cannot just pickup anything hitting the directory.
If I simply 'tail -n 1000000 -f |' them today this works fine for the reader application I am using until the file changes and I cannot setup file lists ranges within the reader application, but can pre-process them so they basically appear as a continuous stream to the reader vs. the reader directly invoking commands to read them. A simple Perl log reader like this also work fine for a static filename but not for dynamic ones. It is critical I don't re-process any logs lines and just capture new lines being written to the logs.
I admit I am not any form a Perl guru, and the best answers / clue I've been able to find so far is the use of Perl's Glob function to possibly do this but the examples I've found basically reprocess all of the files on each run then seem to stop.
Example File Names I am dealing with across multiple apps I am trying to handle..
appA_YYMMDD.log
appA_YYMMDD_0001.log
appA_YYMMDD_0002.log
WS01APPB_YYMMDD.log
WS02APPB_YYMMDD.log
WS03AppB_YYMMDD.log
APPCMMDD_A01.log
APPCMMDD_B01.log
YYYYMMDD_001_APPD.log
As denoted above the files do not have the same inode and simply monitoring the directory for change is not possible as a lot of things are written there. On the dev system it has more than 50 logs being written to the directory and thousands of files and I am only trying to retrieve 5. I am seeing if multitail can be made available to try that suggestion but it is not currently available and installing any additional RPMs in the environment is generally a multi-month battle.
ls -i
24792 APPA_180901.log
24805 APPA__180902.log
17011 APPA__180903.log
17072 APPA__180904.log
24644 APPA__180905.log
17081 APPA__180906.log
17115 APPA__180907.log
So really the root of what I am trying to do is simply a continuous stream regardless if the file name changes and not have to run the extract command repeatedly nor have big breaks in the data feed while some script figures out that the file being logged to has changed. I don't need to parse the contents (my other app does that).. Is there an easy way of handling this changing file name?
How about monitoring the log directory for changes with Linux inotify, e.g. Linux::inotify2? Then you could detect when new log files are created, stop reading from the old log file and start reading from the new log file.
Try tailswitch. I created this script to tail log files that are rotated daily and have YYYY-MM-DD on their names. To use this script, you just say:
% tailswitch '*.log'
The quoting prevents the shell from interpreting the glob pattern. The script will perform glob pattern from time to time to switch to a newer file based on its name.

grab 2 numbers from file name then insert into command

I'm a bit new to programming in general and I'm not sure how to go about accomplish this task in my bash script.
A quick background: when importing my music library (formerly organized by iTunes) to Banshee, all of the files were duplicated to fit Banshee's number style (ex: 02. instead of 02 ) on top of that, iTunes apparently did not save the ID3 tags to the files, so many of them are blank. So now I've got a few thousand tags to fix and duplicate files to get rid of.
To automate the process, I started learning to write bash scripts. I came up with a script (which you can see here) that does four things: removes unnecessary iTunes files, takes input from user about ID3 Tag information and stores it in variables, clears any present tag info from all files, writes new tags with info taken from user, using a program called eyeD3.
Now, here's where I run into my problem. This script is basically blindly writing info to all mp3 files in the dir. This is fine for tags that all the files have in common - like artist, album, total tracks, year, etc. But I can't tag each individual track number with this method. So I'm still editing the track# tags one at a time, manually. And that's something I really don't want to do 2,000+ times.
The files names all look like this:
01. song1.mp3
02. song2.mp3
03. song3.mp3
The command to write a track number to a tag looks like this:
$ eyeD3 -n 1 "01. song1.mpg"
So... I'm not sure how to go about automating this. I need to grab the first two digits of each file name, store them somewhere, then recall each one into a separate eyeD3 command.
You can loop over the files using globbing, and use substring expansion to capture the first two characters of the filename:
for f in *mp3; do
eyeD3 -n ${f:0:2} "$f"
done

Download batch files from a website using linux

i want to downloads some files (nearly about 1000-2000 zip files) from a website.
i can sit around and add each file one after another. please give me a program or script or whatever method so i can automate the download.
The website i am talking about has download link as
sitename.com/sometetx/date/12345/folder/12345_zip.zip
date can be taken care of. the main concern is that number 12345 before and after the folder, they both change simultaneously. e.g.
sitename.com/sometetx/date/23456/folder/23456_zip.zip
sitename.com/sometetx/date/54321/folder/54321_zip.zip
i tried using curl
sitename.com/sometetx/date/[12345-54321]/folder/[12345-54321]_zip.zip
but it makes to much of combination of downloads i.e. keeps left 12345 as it is and scan through 12345 to 54321 the increment left 12345 +1 then repeats scan from [12345-54321].
also tried bash wget
here i have one variable at two places, when using loop the right 12345 with a " _" is ignored by the program.
PLease help me, i dont know much about linux or programing, thanks
In order to get your loop variable next to _ to not be ignored by the shell, put it in the quotes, like this:
$ for ((i=10000; i < 99999; i++)); do \
wget sitename.com/sometetx/date/$i/folder/"$i"_zip.zip; done

splitting text files based column wise

So I have an invoice that I need to make a report out of. It is on average to be about 250 pages long. So I'm trying to create a script that would extract the specific value of the invoice and make a report. Here's my problem:
the invoice is in pdf format with it spanning two column. In Linux command, I want to use 'pdftotext' Linux command to convert into multiple text files (with each txt file representing each pdf page). How do I do that
I recognize that 'pdftotext' command splits it left part of the page and right part of the page by having 21 spaces in between. How do I the right side of the data(identified after reading at least 21 spaces in a row) to the end of the file
Since the file is large and I only last few page of the files, how do I delete all those text files in a script (not manually) until I read a keyword (let's just say the keyword = Start Invoice)?
I know this is a lot of questions, but I'm confused in what Linux command can do. Can you guys guide me to the right direction? Thanks
PS: I'm using CentOS 5.2
What about:
pdftotext YOUR.pdf | sed 's/^\([^ ]\+\) \{21\}.*/\1/' > OUTPUT
pdftotext YOUR.pdf | sed 's/.* \{21\}\(.*\)/\1/' >> OUTPUT
But you should check out pdftotext's -raw and -layout options too. And there are more ways to do it...

Resources