How does X11 clipboard handle multiple data formats? - text

It probably happened to you as well - sometimes when you copy a text from some web page into your rich-text e-mail draft in your favorite webmail client, you dislike the fact that the pasted piece has a different font/size/weight.. it somehow remembers the style (often images, when selected). How is it than that if you paste the same into your favorite text editor like Vim, there's no HTML, just the plain text?
It seems that clipboard maintains the selected data in various formats. How can one access data in any one of those formats (programmatically or with some utility)? How does the X11 clipboard work?

The app you copy from advertises formats (mostly identified by MIME types) it can provide. The app you paste into has to pick its preferred format and request that one from the source app.
The reason you may not see all style info transferred is that the apps don't both support a common format that includes the style info.
You can also see issues because an app may for example try to paste HTML, but not really be able to handle all HTML. Or the apps may be buggy, or may not agree on what a particular MIME type really means.
Almost all apps can both copy and paste plain text, of course, but beyond that it's touch and go. If you don't get what seems to make sense, you could file a bug vs. one of the apps.
You may notice that if you exit the app you're copying from, you can no longer paste. (Unless you're running a "clipboard manager" or something.) This is because no data actually leaves the source app until the destination app asks for a format to paste.
There are "clipboard managers" that ask for data immediately anytime you copy and store that data, so you can paste after the source app exits, but they have downsides (what if the data is huge, or is offered in 10 formats, etc.)
The following python code will show available formats for the currently-copied data, if you have pygtk installed. This app shows the ctrl+c copied data, not the middle-click easter egg. (See http://freedesktop.org/wiki/Specifications/ClipboardsWiki)
#!/usr/bin/python
import gtk;
clipboard = gtk.clipboard_get()
print("Current clipboard offers formats: " + str(clipboard.wait_for_targets()))

The code in Havoc P's answer to show the formats of the current clipboard sadly no longer works due to an API change in PyGTK. Here's an updated version as a one-liner:
python -c 'import gi; gi.require_version("Gtk", "3.0"); from gi.repository import Gtk, Gdk; print(*Gtk.Clipboard.get(Gdk.atom_intern("CLIPBOARD", True)).wait_for_targets()[1], sep = "\n")'
In Arch Linux, you can install PyGTK using sudo pacman -S pygtk.
Below are some examples.
Text from Chrome:
TIMESTAMP
TARGETS
SAVE_TARGETS
MULTIPLE
STRING
UTF8_STRING
TEXT
text/html
text/plain
Text from Gnome Terminal:
TIMESTAMP
TARGETS
MULTIPLE
SAVE_TARGETS
UTF8_STRING
COMPOUND_TEXT
TEXT
STRING
text/plain;charset=utf-8
text/plain

Related

How to override copy-paste functionality in Windows/Linux?

I want to override the copy-paste functionality in either Windows/Linux. How can I do that?
Most applications support simple copy-paste functionality, where the text copied is the one that is currently selected. I want to modify this functionality. Instead of just copying the text, I want to also copy to clipboard: the name of the process/application from where the text was copied, datetime of the copy-paste and user of the system.
For example, when a user selects a text (say "Hello World") from a browser and pastes it into say notepad, the pasted text should be something like "Hello World" (author: , source: chrome.exe, datetime: ....)
How can I do this (either in Windows or Linux)?
It's been a while since I looked at the clipboard but basically, you just create list to clipboard change events which the OS will send to all interested applications.
When the clipboard changes, look at the current content. When it's text (it can also be HTML or application specific binary data), then get the current content, append the information you want and set the new clipboard content.
If you use Java, you can use Toolkit.getDefaultToolkit().getSystemClipboard(); to get access to the clipboard and then add listeners for the various events.
Related:
Copying to Clipboard in Java

Getting data from a browser by screen-scraping

I have gone thru several relevant looking questions but they did not contain the answer I am looking for. So, here is my question:
I have several web applications at my workplace, which are written using different frameworks and the authors are long gone to ask for feature updates. Hence I have to go thru the same grueling sequence of actions to get, which amounts to a file size of few kilobytes, everyday.
I tried parsing the page source but the programming technique of the authors were all over the place. Some even intentionally obscure the code to not let the data show as text, and there is no reason for this as the code they wrote is company asset. Long story short, I realized if I can copy and paste the textual content of these pages, I can process that data much easily than parsing the page source to get the text (which is sometimes totally impossible)
So, I am now looking for a browser plug-in (in windows or linux environments) or equivalent text based tools on windows or linux, which will load these pages and save the text on the screen to file(s) when invoked.
Despite how hard I tried, I am coming up empty handed.
I do not want to utilize the services of a third party screen-scraping web site, as the data is company confidential and not accessible by outside parties. Everything has to happen on the client end as I do not have access to the servers these apps are running on (mostly IIS on windows front end and a oracle db at the back end. The middle tier, as I have explained before is anyone's wild guess, ranging from native oracle apps to weblogic to tomcat and to some in house developed java/javascript stuff.
Thanks for all the help in advance
After searching for an answer for well over a year, I came to realize, as long as I use windows, a modern version of it that is, autohotkey is my savior.
I open the web page, maximize it, place my cursor (mousemove, x, y) then left click (mouseclick, L) then send ctrl-A followed by ctrl-C.
Voila ! everything is in the clipboard. Then I activate my unix session (winactivate PuTTY) and send appropriate key press commands to launch the editor of my choice (which is vi) and finally send a shift-Insert to paste the clipboard into my document. Then save and exit of course.
As an added bonus, right after my document is saved, I can invoke the script of my choice to parse this file and give me back the portion(s) I am interested in.
I know it is not bullet proof, but for my purpose, it helps to a great extent. As a matter of fact, I can do whatever I want with this method.
What about something like this: http://www.nirsoft.net/utils/htmlastext.html
Freeware that converts an HTML page to text
Any of links, lynx or w3m will do what you want, they are text browsers and you can dump text from a webpage with, for example:
w3m -dump http://www.google.com > g.txt

how can I extract text contents from GUI apps in linux?

I want to extract text contents from GUI apps,here are 2 examples::
example 1:
Suppose I opened firefox, and input url : www.google.com
how can I extract the string "www.google.com" from firefox using my own app ?
example 2:
open calculator(using gcalctool),then input 1+1
How can I extract the string "1+1" of calculator from my own program?
in brief ,what I want is to find out whether there is a way to extract the text contents from any widget of an GUI application
Thanks
I don't think there's a generic way to do this, at least not a very elegant one.
Some inelegant ideas:
You might be able to modify the X window system or even some toolkit framework to extract what is being displayed in specific window elements as text.
You could take a screenshot and use an OCR library to convert the pixels back into text for the interesting areas.
You could recompile the apps of interest to add some kind of mechanism for asking them questions.
You could use something like xtest to inject events highlighting the region of interest and copying it to the clipboard.
I believe firefox and gcalctool are for examples only and you just want to know in general how to pass output of one application to other application.
There are many ways to do that on Linux, like:
piping
application1 | application2
btw here is the Firefox command line manual if you want to start firefox on Ubuntu with a URL. eg:
firefox "$url"
where $url is a variable whose value can be www.mozilla.org
That sounds difficult. Supposing you're running X11, you can very easily grab a window picture ( see "man xwd"); however there is no easy way to get to the text unless it's selected and therefore copied to the clipboard.
Alternatively, if you only want to capture user input, this is quite easy to do, too, by activation the X11 record extension: put this in your /etc/X11/xorg.conf:
Section "Module"
Load "record"
#Load other modules you need ...
EndSection
though it may prove difficult to use too, see example code for Xorg/X11 record extension fails

Does the GNOME clipboard have a MIME-type associated with the data?

I know there's several type of selection in Linux: primary, secondary, and clipboard, I treat the first two as short-term clipboard, and clipboard as long-term clipboard. Am I right?
Now, since the primary/secondary selection is text-only, I want to copy image to the long-term clipboard, I don't know if there is a MIME type associated with it. Because the Screen Capture is able to copy the screenshot to clipboard, so I guess there is some meta data to describe it's in image format. But commands like xsel doesn't give any option on operating image clipboard data. Neither copy image file to clipboard nor dump the image from clipboard to file is supported.
After search the google, I found there's some support in Python/Gtk:
import pygtk
pygtk.require('2.0')
import gtk
import os
def copy_image(f):
assert os.path.exists(f), "file does not exist"
clipboard = gtk.clipboard_get()
img = gtk.Image()
img.set_from_file(f)
clipboard.set_image(img.get_pixbuf())
clipboard.store()
I haven't tried it myself, because I'm not familiar with Python, but that look like at least some programs support the image clipboard.
Here is a question, since I guess the clipboard in GNOME may not have a MIME type, does most of Gnome applications share the same convention on image format?
And, what documents shall I refer to if I want to program with image clipboard to share images between different applications, for example one want 8-bit indexed bitmap and another want 24-bit RGB bitmap?
Copy & paste in Gnome performs a negotiated data transfer, one provides a list of formats available for the data and the other chooses their preferred type. APIs such as Gtk+ simplify this by only providing an interface for a GdkPixbuf and managing the format transfer itself.
Usually you are not wanting a raw bitmap transfer as it is going to be rather slow for large images, PNG compression is good but say pasting to OO.o only supports JPEG compression which you usually do not want for non-photo objects. This leads to slow pasting of images from say The Gimp to Writer.
http://library.gnome.org/devel/gtk/stable/gtk-Clipboards.html

Clipboard viewer for programming purposes

I need a clipboard viewer in order to understand the type and contents of the data I'm receiving. Is there any such program available, (for Windows) that lets you explore any type of data currently in the clipboard?
ClipSpy: Unfortunately the only workable multi-format viewer, ClipSpy, shows me the string data wrapped every 10 characters, and expands the hex and binary views which I'm not concerned about.
Start -> Run -> clipbrd
I would use the command-line clipboard tool to send the clipboard contents to a file. Then you view/parse it using any old tool.
I use Ditto which uses an SQL lite database. I am sure you could figure out a way to manipulate the stored data for syntax highlighting or modify the program so that when editing clips it would open with syntax highlighting or in an editor that has it.

Resources