"tag cloud" generators?

"tag cloud" generators? - text

I would like to add a "tag cloud" to a project I'm working on. I see tons of them via google, but they seem to mostly be "enter an url" type.
Here's an example of what I mean:
I'm looking for one which either has either
a nice web-accessible api
a standalone local executable (linux preferred)
a linkable library (c,python preferred)
of course, other options and suggestions appreciated!
update: it seems what I am looking for is commonly called a tag cloud and not a text cloud, even though I am interested in using it to view blocks of text.
update 2: the Most Excellent Jonathan Feinberg and IBM have release Wordle... hooray!!!
http://www.wordle.net

This question is old and already answered, but I would like to say that Wordcram seems to be very nice. And it's open source.

I'm not sure if you are referring to a simple (ala Flickr) tag cloud, or something a little more complicated like Wordle.
Anyway, if you are looking for a simple tag cloud, it wouldn't be too difficult to implement it yourself (as long as you already have the ability to render HTML) as it is just changing the size and/or colour of each item based on its frequency (or some other measure).
If you want to use an existing library you could look at one of the opensource php versions, like Tag Cloud, put just run them locally on your machine using php rather than through a web server. Just install php and run php filename.php similar to how you would execute a python script.
Looking at the Wordle service, there appears to be no way to automatically create one, as they use a java applet to generate the graphics, which cannot easily be scripted using curl. They do have a question in their FAQ about an API however:
Could you expose Wordle as a web
service that generates images?
A scalable web service should take no
more than a few tens of milliseconds
to do its work. To create a Wordle
requires multiple seconds in a Java
runtime. (That pretty animation is not
for show; it's really laying things
out during the animation). Therefore,
Wordle will always apportion the
CPU-intensive stuff to you, the user,
and your CPU.
As of this writing, Wordle is
sustaining 10 hits per second. There's
no way on Earth to render Wordles at
that speed. Well there is a way, but
it involves way more money than I've
got.
Also, this previous question may help.

Here are two Python-Versions of a tag cloud:
https://github.com/atizo/PyTagCloud
http://peekaboo-vision.blogspot.de/2012/11/a-wordcloud-in-python.html
I search a lot these days and it seems that those two are some of the few "stand-alone" tag cloud generators, which run in Linux (in particular those run in python) on the command line.

Related

Is it possible to move a simple .exe based calculator tool to cloud for multi-users?

A few years back, we had a programmer create a type of calculator for use in the audio industry. It was created/compiled in C - and is an .exe we offer for free download. once a user downloads it, they can run it locally.
Our goal is to move that application/calculator to the "cloud" where any user can hypothetically visit a URL, and the calculator is there, running, and ready for user interaction by multiple users.
Is what I describe possible?
Will I need Azure?
What exactly in Azure will I need (i.e. which products/resources)?
Do I need to use the compiled/decompiled version?
Will we need to change any code in the .exe to make this work?
What do I not know that I should know?
I sincerely appreciate any and all input from those that are probably reading this and fully understanding the simplicity of it while I struggle to wrap my head around it.
Thank you!
#Alex, Thank you so much for the response. I apologize I was late getting back to you - have 4 kiddos home - I am actually not sure how to answer that fully. Here is a link to the freely downloadable file we offer to anyone and everyone - it does require an email, but that is just for notifications when the version is updated. https://caf.prosoundtraining.com/verify/ and here is the home page of the site: https://caf.prosoundtraining.com - It works by the user downloading the program (which is what we would like to move to the cloud), and then inputing various values for determining what watt amplifier to use based on speaker selection, length of cables, distance of audience from speakers, etc. basically several calculation tools like that, and then the main program that allows users to visualize amplifier performance via running signal tests through the amplifier. Once a user downloads the program, the calculators (which are separate in the decompiled version I have), they can choose to just use the calculators, or the complete program.

Is what I describe possible?
Well, If you are talking about a pure command-line application: yes.
But, If you are talking about a GUI application: If your application not fully usable from the command line, Go back to the source code and strip down all windows, buttons, UI stuff, etc.. AND make it usable from the command line.
Will I need Azure?
Any cloud provider will do just fine. Depending on the traffic, an old laptop with an internet connection might be enough for your case.
What exactly in Azure will I need (i.e. which products/resources)?
Azure virtual machine with a publicly accessible IP.
Do I need to use the compiled/decompiled version?
Will we need to change any code in the .exe to make this work?
What do I not know that I should know?
Using decompiled version (Source code):
WebAssembly
WebAssembly is a new type of code that can be run in modern web browsers — it is a low-level assembly-like language with a compact binary format that runs with near-native performance and provides languages such as C/C++, C# and Rust with a compilation target so that they can run on the web. It is also designed to run alongside JavaScript, allowing both to work together.
Emscripten
Compile C and C++ code, or any other language that uses LLVM, into WebAssembly, and run it on the Web, Node.js, or other wasm runtimes.
Using compiled verison (executable):
Common Gateway Interface-ish approach
Common Gateway Interface (CGI) is an interface specification for web servers to execute programs like console applications (also called command-line interface programs) running on a server that generates web pages dynamically. Such programs are known as CGI scripts or simply as CGIs. The specifics of how the script is executed by the server are determined by the server. In the common case, a CGI script executes at the time a request is made and generates HTML.
My suggestion would be to use python, PHP or any scripting language you are familiar with to spin up a webserver and execute command based on incoming requests.
Python example:
import subprocess
from flask import Flask
app = Flask(__name__)
#app.route('/calculator') # Accessible from http://[ipaddress]:[port]/calculator
def hello_RedPanda():
command = "..." # as if you are running your program from cmd.
result = subprocess.check_output(command) # execute given command
return result # to your web browser
Once you are done with all the back-end pluming, you can add your buttons and tabs back by rebuilding your UI on the client-side using HTML, CSS and Javascript.
Building the UI in client-side is actually the easy part. The really tricky part in your case is (as I mentioned above) making your application usable from the command line.
See more:
WebAssembly
Emscripten
CGI
A case similar to yours

HITs created with create_hit with externalQuestion using boto3, not visible at requester's account

I'm a novice mturk user. I created HITs for crowdsourcing using external question hosted on a server. I wanted to know if there is a web interface where I can see progress of my HITs. I tried looking at the https://requester.mturk.com/manage and https://requestersandbox.mturk.com/manage. But I cannot see the HITs programatically created using boto3. Should I look somewhere else? If not what's the way to get this information?

I share your pain right now. As of June 2020, this situation hasn't changed. HITs that are NOT created through the MTurk web interface STILL do not display on the web interface. It's terrible. We have 3 options for seeing and managing the HITs:
Use scripting and boto3. <-- Best option for now.
Use the AWS CLI.
Use the AWS shell (aws-shell).
I think the best option is to make scripts that do exactly what you need. Chances are you'll need to do things more efficiently than you could using the AWS CLI only. aws-shell isn't easy enough to use, and it also looks unsupported for over a year at this point (judging by their official github issue tracker).
For what you're asking specifically you'll need to use the method list_hits() and possibly list_assignments_for_hit(). See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/mturk.html
Also I'm very new to this, so if it sounds like a barely or only sorta know what I'm talking about, that's correct. But I also wished there had been a straightforward answer to this question a couple weeks ago when I was sitting here dumbfounded.

Kiosk program (web browser), deployment struggles

Okay, here's a complicated one I've been breaking my head over all week.
I'm creating a self service system, which allows people to identify themselves by barcode or by smartcard, and then perform an arbitrary action. I run a Tomcat application container locally on each machine to serve up the pages and connect to external resources that are required. It also allows me to serve webpages which I then can use to display content on the screen.
I chose HTML as a display technology because it gives a lot of freedom as to how things could look. The program also involves a lot of Javascript to interact with the customer and hardware (through a RESTful API). I picked Javascript because it's a natural complement to HTML and is supported by all modern browsers.
Currently this system is being tested at a number of sites, and everything seems to work okay. I'm running it in Chrome's kiosk mode. Which serves me well, but there are a number of downsides. Here is where the problems start. ;-)
First of all I am petrified that Chrome's auto-update will eventually break my Javascript code. Secondly, I run a small Chrome plugin to read smartcard numbers, and every time the workstation is shutdown incorrectly Chrome's user profile becomes corrupted and the extension needs to be set up again. I could easily fix the first issue by turning off auto-update but it complicates my installation procedure.
Actually, having to install any browser complicates my installation procedure.
I did consider using internet explorer because it's basically everywhere, but with the three dominant versions out there I'm not sure if it's a good approach. My Javascript is quite complex and making it work on older versions will be a pain. Not even mentioning having to write an ActiveX component for my smartcards.
This is why I set out to make a small browser wrapper that runs in full screen, and can read smartcard numbers. This also has downsides. I use Qt: Qt's QtWebkit weighs a hefty 10MB, and it adds another number of dependencies to my application.
It really feels like I have to pick from three options that all have downsides. It really is something I should have investigated before I wrote the entire program. I guess it is a lesson learnt well.
On to the questions:
Is there a pain free way out of this situation? (probably not)
Is there a browser I can depend on without adding tens of megabytes to my project?
Is there another alternative you could suggest?
If you do not see another way out, which option would you pick?

Are there any building blocks for a search engine that will scrape other sites?

I want build a search service for one particular thing. The data is freely available out there, via free classified services, and a host of other sites.
Are there any building blocks, e.g. open-source crawlers that I would customize - rather than build from scratch, that I can use?
Any advice on building such a product? Not just technical, but any privacy/legal things that I might need to take into consideration.
E.g. do I need to 'give credit' where the results are from and put a link to the original - if I get them from many places?
Edit: By the way, I am using GWT with JS for the front-end, haven't decided on the language for the back-end. Either PHP or Python. Thoughts?

There are few blocks in python you can use.
beautifulsoup [http://www.crummy.com/software/BeautifulSoup/] for parsing HTML. It can handle bad code too, and its API is veeery easy... way better than any DOM-like tool for me. My friend used it to scrape his old phpbb forum with success. It has pretty good docs.
mechanize [http://wwwsearch.sourceforge.net/mechanize/] is a webbrowser-simulating http client library. It handles cookies, filling forms and so on. Also easy to use, but it helps if you understand how does http work.
http://dev.scrapy.org/ -- this is a relatively new thing: a whole scraping framework based on twisted. I haven't played with it much.
I use first two for my needs; f.e. it needs 20 lines of code to get an automatic testing tool for a 3-stage poll, with simulation of waiting for user entering data and so on.

I made a screen-scraper in Ruby that took like five minutes. Apparently this dude has it down to 60 seconds! I'm not sure if Ruby is as scalable or fast as what you're looking for, but I've never seen a faster route to a proof-of-concept or a prototype.
The secret is a library called "hpricot", which was built for exactly this purpose.
I don't know anything about PHP or Python or what's available for those development systems/languages.
Good luck!

Common Lisp: What's the best way to use libraries in a shared hosting environment?

I was thinking about this the other day and wanted to see what the SO community had to say about the subject.
As it stands right now Common Lisp is getting some attention as a web development platform, and with good reason (of which I'm sure you are already convinced).
I was wondering how one would go about using a library in a shared environment in a similar fashion to PHP.
If I set up something like SBCL as an interperter to interpret FASL files like Python or PHP, what would be the best way to use libraries (like clsql for instance).
Most come as asdf installable libraries, but it would be a stupid amount of overhead to require and install the library each and every time a request is made.
Keeping in mind this is for shared hosting; would it be best to ..
1) Install system wide copies of the libraries for use in applications; reduces space, but there may be problems with using the correct version of the library.
2) Allow users (through a control panel) to install local copies for themselves; more space, no version problems.
3) Tell them to wrap it into a module and load it on demand like Python does (I'm not sure if/how this can be done with Lisp). Just being able to load a library for use would be the best option, but I don't think a lot of them are designed to be used this way.
Anyways, looking to hear your opinions, thanks.

There are two ways I would look at it:
start a Lisp for each request
This way it would be much better that the Lisp is a saved image with all necessary libraries and data loaded. But that approach does not look very promising to me.
run a Lisp and let a frontend (web browser, another web server, ...) connect to it
This way you can either start a saved image or a Lisp that loads a bunch of stuff once and serves the requests.
I like to use saved images/applications in a deployment scenario. They can be quickly started, contain all the necessary software and are independent of library changes.
So it might be useful to provide pre-configured Lisp images that contain the necessary software or let the user configure and save an image.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string