VB/VBA: Fetch HTML string from clipboard (copied via web browser) - excel

It seems that when you copy something from a web browser to the clipboard, at least 2 things are stored:
Plain text
HTML source code
Then it is up to the software that you are pasting into can determine which one it wants.
When pasting into MS Excel 2003, you have a paste special option to paste HTML, which will paste the formatted HTML (as it is displayed by the browser).
What I want to do is paste the actual source code as plain text. Can this be fetched from the clipboard in VBA?
Edit I'm trying to access all the source-code of the copied HTML, including the tags.

This time I've read the question properly and realised coonj wants to get the HTML from the clipboard including tags.
I believe this is reasonably difficult. You need to read the clipboard using Windows API calls. And then, parse the resulting CF_HTML string which has some wacky headers added on top of the HTML.
Microsoft Knowledge Base article with Windows API code to read the CF_HTML from the clipboard (function GetHTMLClipboard).
You will then probably want to ignore the wacky headers. Microsoft documents the format here. An example CF_HTML fragment is shown below. You could probably come up with some guesswork method of skipping the first few lines.
Version:0.9
StartHTML:71
EndHTML:170
StartFragment:140
EndFragment:160
StartSelection:140
EndSelection:160
<!DOCTYPE>
<HTML>
<HEAD>
<TITLE>The HTML Clipboard</TITLE>
<BASE HREF="http://sample/specs">
</HEAD>
<BODY>
<!--StartFragment --> <P>The Fragment</P>
<!--EndFragment -->
</BODY>
</HTML>
It might also be worth thinking whether there's any other way of solving your problem. E,g, Will the browser always be Internet Explorer? Can you get what you need by walking the HTML tree using the COM object model?
EDIT: coonj has tried this now and says "the GetHTMLClipboard function seems to work with both Firefox and IE, and it doesn't look like it is throwing those headers in there"

VB6 has the Clipboard object that allows you to get the clipboard data in different formats. VBA doesn't have this object. But there are windows API calls you can use. You can see a sample implementation for VBA here.

Related

PrimeFaces Extensions CKEditor: attempts to set encoding to UTF-8 unsuccessful

Why I am using this editor:
In the past I used PrimeFaces p:editor which is however deprecated and lacks functions that the users desperately want. I cannot use the new PrimeFaces p:textEditor because of this: Primefaces textEditor: converting text to HTML with JavaScript not working.
What is it used for:
I am using pe:ckEditor from PrimeFaces Extensions in my program, in which the editor is used by the user to create an e-mail message content. Then by click on a send button, the HTML from the editor is taken and sent via e-mail to a client.
What is the issue:
When using p:editor, I got the HTML by JavaScript function saveHTML and it worked perfectly even when the text contained Czech characters (ěščřžýáíéó), I did not even have to set enconding or anything else and it worked.
Now however when user writes "V případě dalších dotazů se na nás můžete obracet každý den na telefonním čísle", the gotten HTML has the text like this:"V pÅípadÄ dalších dotazů se na nás můžete obracet každý den na telefonním Äísle" - complete rubbish that the user obviously cannot send to a client...
My research:
EDIT: Based on some comments, I tried to add the <meta charset="utf-8> and <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> but that did not help. In pom.xml I have found also this <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>, so I do not think that there is a problem in the HTML page, but in settings of the editor itself...
So I figured, that the encoding must be set especially for the editor in its config. I finally figured how to make the editor access the custom config, but nothing that I found on the Internet and added to the config worked for me:
config.language='cs';
And:
config.entities_latin = false;
And:
config.entities = false;
And:
config.basicEntities = false;
And all its combinations.
ANOTHER EDIT:
Based on some other comments here, I also installed OmniFaces and tried to solve this by CharacterEncodingFilter, but nothing changed and it is still not working.
I also found out that my problem seems to be very related to this issue: Unicode input retrieved via PrimeFaces input components become corrupted, but the accepted answer there gives 3 ways how to solve it, one is the CharacterEncodingFilter, other way is not applicable for Tomcat users (me) and the last "solution" seems to be reporting this to PrimeFaces Extensions developers (which I did: https://github.com/primefaces-extensions/primefaces-extensions.github.com/issues/756 ).
Please let me know if you know how to fix this or if there is any workaround.
PrimeFaces Extensions - version 7.0.2;
PrimeFaces - version 7.0.7
I and my colleague found out what the issue was based on the test code that #melloware provided.
The original editor p:editor, which we had been using and which we are trying to replace by pe:ckEditor, could provide us with its content in HTML only in case we used JavaScript function saveHTML.
But with pe:ckEditor, anytime the user hit the Send button, whose onstart contained the saveHTML, the saveHTML corrupted the content. Once we erased the saveHTML and took the pe:ckEditor content as it was (which is already in HTML), it was fine without corrupted characters.

Remove tag &#8203 in WP theme [duplicate]

EDIT: You can see the issue here (look in source).
EDIT2: Interesting, it is not an issue in source. Only with the console (Firebug as well).
I have the following markup in a file called test.html:
​<!DOCTYPE html>
<html>
<head>
<title>Test Harness</title>
<link href='/css/main.css' rel='stylesheet' type='text/css' />
</head>
<body>
<h3>Test Harness</h3>
</body>
</html>
But in Chrome, I see:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
"​
"
<title>Test Harness</title>
<link href='/css/main.css' rel='stylesheet' type='text/css' />
<h3>Test Harness</h3>
</body>
</html>
It looks like &#802 is a zero width space, but what is causing it? I am using Sublime Text 2 with UTF-8 encoding and Google App Engine with Jinja2 (but Jinja is simply loading test.html). Any thoughts?
Thanks in advance.
It is an issue in the source. The live example that you provided starts with the following bytes (i.e., they appear before <!DOCTYPE html>): 0xE2 0x80 0x8B. This can be seen e.g. using Rex Swain’s HTTP Viewer by selecting “Hex” under “Display Format”. Also note that validating the page with the W3C Markup Validator gives information that suggests that there is something very wrong at the start of the document, especially the message “Line 1, Column 1: Non-space characters found without seeing a doctype first.”
What happens in the validator and in the Chrome tools – as well as e.g. in Firebug – is that the bytes 0xE2 0x80 0x8B are taken as character data, which implicitly starts the body element (since character data cannot validly appear in the head element or before it), implying an empty head element before it.
The solution, of course, is to remove those bytes. Browsers usually ignore them, but you should not rely on such error handling, and the bytes prevent useful HTML validation. How you remove them, and how they got there in the first place, depends on your authoring environment.
Since the page is declared (in HTTP headers) as being UTF-8 encoded, those bytes represent the ZERO WIDTH SPACE (U+200B) character. It has no visible glyph and no width, so you won’t notice anything in the visual presentation even though browsers treat it as being data at the start of the body element. The notation ​ is a character reference for it, presumably used by browser tools to indicate the presence of a normally invisible character.
It is possible that the software that produced the HTML document was meant to insert ZERO WIDTH NO-BREAK SPACE (U+FEFF) instead. That would have been valid, since by a special convention, UTF-8 encoded data may start with this character, also known as byte order mark (BOM) when appearing at the start of data. Using U+200B instead of U+FEFF sounds like an error that software is unlikely to make, but human beings may be mistaken that way if they think of the Unicode names of the characters.
I understand that there is a bug in SharePoint 2013 where the HTML editor adds these characters into your content.
I've been dealing with this for a bit and this is the solution I am using which seems to be working. I added this javascript into a file referenced by my masterpage.
var elements = ["h1","h2","h3","h4","p","strong","label","span","a"];
function targetZWS(){
for (var i = 0; i < elements.length; i++) {
jQuery(elements[i]).each(function() {
removeZWS(this);
});
}
}
function removeZWS(target) {
jQuery(target).html(jQuery(target).html().replace(/\u200B/g,''));
}
/*load functions*/
$(document).ready(function() {
_spBodyOnLoadFunctionNames.push("targetZWS");
});
Links I looked into investigating this:
https://social.msdn.microsoft.com/Forums/sharepoint/en-US/23804eed-8f00-4b07-bc63-7662311a35a4/why-does-sharepoint-put-in-character-code-8203-in-a-richtext-field?forum=sharepointdevelopment
https://social.technet.microsoft.com/Forums/office/en-US/e87a82f0-1ab5-4aa7-bb7f-27403a7f46de/finding-8203-unicode-characters-in-my-source-code?forum=sharepointgeneral
http://www.sharepointpals.com/post/Removing-8203-in-RichTextHTML-field-Sharepoint
Try this script. It works for me
$( document ).ready(function() {
var abc = document.body.innerHTML;
var a = String(abc).replace(/\u200B/g,'');
document.body.innerHTML = a;
});
I have experienced this in a major project I was working on.
The trick is to just:
copy the whole code into notepad.
save it as a text file.
close the file. open it again and copy your code back into your IDE
environment.
and its voilà, it's gone.!
I was able to remove these in Sublime by selecting the characters surrounding it and copy/pasting into Find and Replace.
In my case, symbol "​" did not appear in the code editor MS Code and was visible only in the tab Elements Chrome. It helped to delete the tag after which this symbol appeared and the reprint of this tag was handwritten again, apparently this symbol clung to the ctrl+c / ctrl+v while transferring the code.
This “8203;” HTML character is a no width break control.
It can easily find in the Google Chrome Browser inspect elements section. And When you try to remove it from your code, most of the Major IDE not showing to me...(Maybe by my preference).
I found the new text editor Brackets download it and open my code in the editor. It shows the character with red dots. Just remove it check everything is working well.
I found this solution from a blog. What is “8203​” HTML character? Why is being injected into my HTML?
Thank You for saving me hours.
I cannot find where it's being injected on my page. I'll investigate it more later, but for now, I just threw this in my page so I can keep working.
$(function(){
$('body').contents().eq(0).each(function(){
if(this.nodeName.toString()=='#text' && this.data.trim().charCodeAt(0)==8203){
$(this).remove();
}
});
});

ReactJS components for PDF export, Excel export and print

I´m building some ReactJS Table and Report components tha basically will contain <table> data, some graphics (d3) and some textual data. I need to provide 3 buttons:
Export to PDF
Export to Excel
Print
Are there any trustable packages available for the tasks above using ReactJS ? What is the approach to handle these requirements ?
I would use a combination of the following JavaScript libraries:
React Csv is my favourite JavaScript library for working with csv. It excels at dynamic generation.
PDF Make is my favourite JavaScript library for generating PDFs.
Note: I would attach the files to this post, but there is currently no facility for this in StackOverflow.
react-pdf (published 16 days ago - not tested)
react-export-excel (published 8 months ago - not tested)
For printing, put window.print() in your code
Example:
Copy and paste this as .html (in React, it would be onClick={window.print()} ... If I'm correct)
<!Doctype html>
<html>
<body>
<p>Use the button below to print</p>
<button id="print" onclick="window.print()">Print</button>
</body>
</html>
Old question, but since I was looking for something similar, I ended up using :
jspdf (see specific issue for reactjs)
react-excel-workbook
nothing yet, feel free to update the answer if you have.

WKHTMLTOPDF Dynamic Header on every page

I am trying to produce a PDF file using WKHTMLTOPDF library in NODE for a large HTML file. I need to be able to stuff in some content in the Header and Footer on every page. But the content on the header changes on every page for e.g, have custom numbering in a format like BX008761. The number should increment on every page.
First page will be BX008761, second page BX008762, third BX008763 so on..
I could find a thread which is related..
WKHTMLTOPDF -- Is possible to display dynamic headers?
the above thread states:
"you can feed --header-html almost anything :) Try the following to see my point:
wkhtmltopdf.exe --margin-top 30mm --header-html isitchristmas.com google.fi x.pdf
So isitchristmas.com could be www.yoursite.com/magical/ponies.php"
does the source value provided for --header-html option be called for every page of the PDF rendered or it is called just once for every PDF..?
Appreciate your support.Thank you.
EDIT : I have tried a sample program and confirmed that it will process the value provided for --header-html option on every page rendered with in PDF. I am using a remote service to return the HTML string as a response to the url.
Now it is displaying the html string as is, instead of decoding it.
when the service returns below string:
<html> <body> <span style="color:red" > 123 :: 0 :: 3000025 :: 634943551338828720</span> <body> <html>
then the header on every page is also same as above instead of displaying the text in red color. how do i make the wkhtmltohtml understand that the content it received from service need to be decoded.
appreciate if any one can suggest a workaround.
Thank you.
EDIT : I have used another work around to return a HTML page for the header content. I used essentially a HTTPHandler in asp.net to return a valid response and the issue looks to have addressed the core issue of having a dynamic header on every page.

Can I add a button to a CFGRID that lets a user export the grid to an XLSX file? How?

I'm a coldfusion developer working on a reporting application to display information from a CFSTOREDPROC process. I've been able to get the data from my query to display correctly in a CFGRID, and I'm really happy with the display of the data. The grid saves a lot of time because it avoids using the CFOUTPUT tag and formatting the data in HTML for hundreds of reports.
All I would like to do is add a simple Disk Icon somewhere on the datagrid control that would save the contents of the datagrid and export it into an XLSX(2010) file that an end user could then manipulate in a spreadsheet program. This is important because the data needs to have a 'snapshot' at certain times of year saved.
Solutions Tried:
I looked into having a link from the report options page that would fire into a report_xls.cfm page but designing a page that catches all of the report options a second time seems dumb and would add thousands of CFM's to the website.
CFSPREADSHEET seems not to work for a variety of reasons. One is that the server seems to constantly fight me with the 'write' function in this tag. Another is that I don't know how to make the javascript work for this button to get the output that I want.
I also looked into doing this as a Javascript button that would fire based on the data entered. Although the data from a CFSTOREDPROC will display correctly if I use a CFOUTPUT block, CFGRID seems to have a hard time with all output styles except HTML. This has caused some difficulty with these solutions because the application doesn't spit out a neat HTML table but instead sends a javascript page section.
Raymond Camden's blog contains an entry Exporting from CFGRID that we used in our project.
The example in the article exports to PDF, but it is rather simple to modify the download.cfm file to export to Excel files as well:
You modify the file to generate the <table>...</table> HTML from his example in a <cfsavecontent variable="exportList"> tag, so that the #exportList# variable contains the table that will be shown in the spreadsheet.
Next we have a URL parameter mode that determines whether it is exported to PDF or Excel.
So the end of our download.cfm looks like the following:
<cfif url.mode EQ "PDF">
<cfheader name="Content-Disposition" value="inline; filename=report.pdf">
<cfdocument format="pdf" orientation="landscape">
<cfoutput>#exportList#</cfoutput>
</cfdocument>
<cfelse>
<cfcontent type="application/vnd.ms-excel">
<cfheader name="Content-Disposition" value="report.xls">
<cfoutput>#exportList#</cfoutput>
</cfif>

Resources