Is there a Linux command line utility for getting random data to work with from the web? - linux

I am a Linux newbie and I often find myself working with a bunch of random data.
For example: I would like to work on a sample text file to try out some regular expressions or read some data into gnuplot from some sample data in a csv file or something.
I normally do this by copying and pasting passages from the internet but I was wondering if there exists some combination of commands that would allow me to do this without having to leave the terminal. I was thinking about using something like the curl command but I dont exactly know how it works...
To my knowledge there are websites that host content. I would simply like to access them and store them in my computer.
In conclusion and as a concrete example, how would i copy and paste a random passage off the internet from a website and store it in a file in my system using only the command line? Maybe you can point me in the right direction. Thanks.

You could redirect the output of a curl command into a file e.g.
curl https://run.mocky.io/v3/5f03b1ef-783f-439d-b8c5-bc5ad906cb14 > data-output
Note that I've mocked data in Mocky which is a nice website for quickly mocking an API.

I normally use "Project Gutenberg" which has 60,000+ books freely downloadable online.
So, if I want the full text of "Peter Pan and Wendy" by J.M. Barrie, I'd do:
curl "http://www.gutenberg.org/files/16/16-0.txt" > PeterPan.txt
If you look at the page for that book, you can see how to get it as HTML, plain text, ePUB or UTF-8.

Related

Creating custom documents (PDFs) on the fly on a website

I want to create custom documents on my website which is running on a Linux-based server. My website has user login capability to access specific details on the website.
What I want to do is:
Use a default .tex file where the contents of the main document are stored. This would be available on the server (on admin side);
Get few user specific inputs (like login name, the day and date when the request was made), their custom inputs like what specific details they want (this will make it possible to include or exclude few chapters, sections from the document);
Using the inputs received above (in point 2), the document would be customized on the fly on the website by running LaTeX compiler and the output of the compilation would be shared with the user.
My questions are:
Has someone tried this before? Any suggestions, alternatives they can point to? If there is any other better solution than LaTeX, I am open to hear and understand that as well.
Are there any specific settings that we need to do either on the server or on LaTeX installation that will enable doing this?
Any additional packages, programs are required to be installed to get this working?
Any help and insights would be appreciated.
You can generate PDF using appropriate libraries for programming language you use for your back-end. This is definitely safer than injecting user input into TeX file and probably would be faster too.
PHP: Best way to create a PDF with PHP
Ruby: https://github.com/prawnpdf/prawn
anything else: google for "$LANGUAGE generate pdf".
The first and the second questions can be done in any programming language you choose while reading the .tex template and add/omit the data, then save it to the temporal .tex file. After compilation yo can remove this file. If you are working with a linux server you can use a service (cron, systemd) to automate the cleaning of files.
To compile and get the file you must use pdflatex command line program, which is the one any LaTeX editor uses. I compile my LaTeX documents this way in linux. I think this way is quite fast, except if you want images in this document, or are using tikz pictures.
I know I am suggesting the old way to do the work, but usually is the best way.
And, finally, I think PHDComics uses something like this for the emergency button (down in the right), only that in the site the pdf is already generated for the specific comic: http://www.phdcomics.com/

Search through scripts of (multiple) cimplicity screens

We are using Cimplicity to operate some installations at our plant. The frontend consists of a lot of .cim files, which are the screens presented to the operator. These files are built with 'cimedit', which is basically a graphical click and drag program with which you can assemble the screens. Each object you drag onto the screen has the option to run a script, which brings me to my problem.
Because each screen contains a lot of small scripts and functions it is hard to keep track of what does what. For example I'm trying to figure out where a certain table from my database is being accessed or updated. Since the files all seem to be compressed (or so) I can't use a regular 'search the contents of this file' search.
Things I've tried so far are searching using windows, with the content option enabled and also tried the compression option. This had no success. It makes sense because like I said, the files seem to be compressed, so the actual script is not stored in plain text.
So, my question in short:
How do I search all the scripts of (preferably multiple) cimplicity screens?
Any tips on how to search compressed files are also very much appreciated.
I stumbled upon another stackoverflow post while searching for a better windows search tool and ended up finding this post: https://superuser.com/questions/26593/best-way-to-confidently-search-files-and-contents-in-windows-without-using-an
This posts recommends Agent Ransack and it is actually possible to search through the .cim files with this tool.

Using different program office extension

I have a program that can access a database with a whole bunch of articles.
Due to copyright, I can't access the database straight from my program, but I have a different program that can access it, and it's legitimate to copy small bits from the articles.
Because my friends and I quote a lot from these articles, I thought it would be useful if we could find an add-in for Word that will copy the requested part from an article.
Is there any add-in for Word that would let me use the program that I mentioned above so that I can access the database from within Word?
I would like to program this add-in myself, if possible.
Without further information about which operating system, and version of Word you are using, I can offer only a general outline.
1) It seems to me that you want to make a Word macro using Word Basic, or Visual Basic.
2) When you want to call your program which is external to Word, you need to use the shell command as outlined here from Microsoft's webpage.
I hope that helps you get started writing your macro!
CHEERS
Well its a wrokaround but you can use an automation tool which can run a sequence of actions on a given GUI like Winrunner or TestQuest to semulate the usage of the program, i assume these tools can get an input from a given xml or text file and log outputs in log text file.
If you have the output in a text file you will be able to parse the file using any programmign language and get the information you need and write it to eord or whatever format using OLE objects.

Getting data from a browser by screen-scraping

I have gone thru several relevant looking questions but they did not contain the answer I am looking for. So, here is my question:
I have several web applications at my workplace, which are written using different frameworks and the authors are long gone to ask for feature updates. Hence I have to go thru the same grueling sequence of actions to get, which amounts to a file size of few kilobytes, everyday.
I tried parsing the page source but the programming technique of the authors were all over the place. Some even intentionally obscure the code to not let the data show as text, and there is no reason for this as the code they wrote is company asset. Long story short, I realized if I can copy and paste the textual content of these pages, I can process that data much easily than parsing the page source to get the text (which is sometimes totally impossible)
So, I am now looking for a browser plug-in (in windows or linux environments) or equivalent text based tools on windows or linux, which will load these pages and save the text on the screen to file(s) when invoked.
Despite how hard I tried, I am coming up empty handed.
I do not want to utilize the services of a third party screen-scraping web site, as the data is company confidential and not accessible by outside parties. Everything has to happen on the client end as I do not have access to the servers these apps are running on (mostly IIS on windows front end and a oracle db at the back end. The middle tier, as I have explained before is anyone's wild guess, ranging from native oracle apps to weblogic to tomcat and to some in house developed java/javascript stuff.
Thanks for all the help in advance
After searching for an answer for well over a year, I came to realize, as long as I use windows, a modern version of it that is, autohotkey is my savior.
I open the web page, maximize it, place my cursor (mousemove, x, y) then left click (mouseclick, L) then send ctrl-A followed by ctrl-C.
Voila ! everything is in the clipboard. Then I activate my unix session (winactivate PuTTY) and send appropriate key press commands to launch the editor of my choice (which is vi) and finally send a shift-Insert to paste the clipboard into my document. Then save and exit of course.
As an added bonus, right after my document is saved, I can invoke the script of my choice to parse this file and give me back the portion(s) I am interested in.
I know it is not bullet proof, but for my purpose, it helps to a great extent. As a matter of fact, I can do whatever I want with this method.
What about something like this: http://www.nirsoft.net/utils/htmlastext.html
Freeware that converts an HTML page to text
Any of links, lynx or w3m will do what you want, they are text browsers and you can dump text from a webpage with, for example:
w3m -dump http://www.google.com > g.txt

how can I extract text contents from GUI apps in linux?

I want to extract text contents from GUI apps,here are 2 examples::
example 1:
Suppose I opened firefox, and input url : www.google.com
how can I extract the string "www.google.com" from firefox using my own app ?
example 2:
open calculator(using gcalctool),then input 1+1
How can I extract the string "1+1" of calculator from my own program?
in brief ,what I want is to find out whether there is a way to extract the text contents from any widget of an GUI application
Thanks
I don't think there's a generic way to do this, at least not a very elegant one.
Some inelegant ideas:
You might be able to modify the X window system or even some toolkit framework to extract what is being displayed in specific window elements as text.
You could take a screenshot and use an OCR library to convert the pixels back into text for the interesting areas.
You could recompile the apps of interest to add some kind of mechanism for asking them questions.
You could use something like xtest to inject events highlighting the region of interest and copying it to the clipboard.
I believe firefox and gcalctool are for examples only and you just want to know in general how to pass output of one application to other application.
There are many ways to do that on Linux, like:
piping
application1 | application2
btw here is the Firefox command line manual if you want to start firefox on Ubuntu with a URL. eg:
firefox "$url"
where $url is a variable whose value can be www.mozilla.org
That sounds difficult. Supposing you're running X11, you can very easily grab a window picture ( see "man xwd"); however there is no easy way to get to the text unless it's selected and therefore copied to the clipboard.
Alternatively, if you only want to capture user input, this is quite easy to do, too, by activation the X11 record extension: put this in your /etc/X11/xorg.conf:
Section "Module"
Load "record"
#Load other modules you need ...
EndSection
though it may prove difficult to use too, see example code for Xorg/X11 record extension fails

Resources