Gmail is trimming html email content. How to avoid the issue? - gmail

Gmail introduced a trimming feature in emails for "better readability". This causes a lot of pain for me, as I have a notification system for email, where I send some html email messages to users. Basically email looks like this:
divs and styling
Object alert in Project by User
tables and tr/td
User Action on Object in Project
/tables and tr/td
/divs and styling
link
footer
To group all emails in one conversation, first email has subject, subsequent emails have Re: subject.
Active users can receive significant amounts of emails like this, but due to "better readability" feature, ALL of the email content (starting from second email) is suppressed.
I am looking for advice - maybe I should redesign my html, or gmail has some anti-suppression code, or just a hack to go around this issue.
Issue from users perspective is described here: http://www.google.com/support/forum/p/gmail/thread?tid=756b83fa60ca1df7&hl=en

I had the trimming problem occurring on a table of an HTML newsletter.
It was very important that the entire table display because it was the
#1 content our client wanted to communicate. Here's the fix, or at least here's how we solved our problem. We eliminated any repetition.
So for this table, the lines in between each row, Gmail was seeing the
lines as repetitive. So I altered the pixel width by 1 px every other
line, which eliminated the repetition and fixed our problem. So that
said, look for repetition, and try to remove it. OR in some cases, you
might have to add type (in white) to create the variation.
Source.
PS: This is a bit unrelated, but I stumbled upon this question while looking for a way to disable the content trimming and keep the conversation view at the same time. I didn't find anything, so I developed a small extension for Chrome and Firefox.

It turns out that there is a very simple rule which causes this behaviour: Gmail will clip the email as soon as it sees the sender (From:) name in the body of the message, regardless of where this appears.
Solution: make sure that that the From: name in your email is not used in the message body (except in the signature, which will probably get clipped!).

This is an awful bug in Gmail, if you're unlucky enough to get bitten by it.
In my case, it was "trimming" an entire message, in a clean thread. See an example here, noting that the "trimmed" content is expanded in the screen-shot.
I ultimately worked around Gmail's bug by removing the entire header you see in that example ("Awesome Home Swap"), including the border below it. I stopped short of actually trying to figure out what specifically was making Gmail confuse that header as a "signature" (though I suspect it could have been the border, implemented using CSS directive border-bottom:1px dotted grey to style the <td> element).

I just found a solution that worked wonderfully for me. Simply create a bunch of hidden unique images throughout your emails to provide uniqueness to parts of the email that aren't actually unique. I'm building my emails with React so I have this Unique components that I'm using pretty much everywhere:
import * as React from "react"
function random() {
return Math.round(Math.random() * 10000000).toString()
}
class Unique extends React.PureComponent {
render() {
return (
<img style={Unique.style} src={`data:image/png;base64,${random()}`} />
)
}
static style = {
visibility: "hidden",
display: "none",
width: 0,
height: 0,
color: "transparent",
background: "transparent",
}
}
One thing I like about this solution is that it doesn't mess up the email preview text that would otherwise happen if you're using hidden text.

Add a double hyphen -- before the collapsed part. I was able to wrap it in a with font color matching the background. Worked for me...

I looked at the emails Gmail sent. Adds the following code (spacer.gif).
I guess this is the solution.
<img alt="" height="1" width="3" src="https://notifications.google.com/g/img/AD-FnEztup4OClDshQhMVXDbi6Oi0lSN-FgEY1jyW384aotccA.gif">
</td>
</tr>
</table>
</body>
</html>

Related

How to filter text after webscraping

So I'm trying to webscrape this website that provides novels for free, for example this page: https://www.wuxiaworld.com/novel/martial-world/mw-chapter-1
I'm trying to only extract the title and the body of the chapter. Finding the title is easy enough since its in h4, however the body of the chapter is not separated by any specific div tags so I cannot just isolate it. I was wondering how I'd do this. The closest Ive gotten to just having the text is this.
Ps. Im new to webscraping, sorry if my question is unclear or stupid.
I tried to identify if the body of text was under any exclusive div tag but it wasn't, so i tried to call it under whatever the closest div tag was, this still returned a lot of useless and unwanted text.
edit : #koro, there's more than one instance of fr-view being used so it doesn't isolate the text. fr-view class also appears before the chapter text.
I'm not versed in webscraping but upon reviewing the page source html I see that <div class="fr-view"> only precedes the body text on the novel pages. If you start the logging after the scraper identifies this line you should be able to stop at the very next <a href="/novel..... tag to only have the novel text included.
Some of the pages I see also include footnotes with some extra information, these include an <a href=#footnote....> tag, so if you would like to keep the footnotes included I would search for <a href=/novel...> and NOT <a href=...>
P.S. I only looked at 4 pages and while they all appear to have the same format that I've pointed out above it's still possible that you may run into issues, but that's definitely something you can a bridge you can cross when you get there!

How to destroy growl div component that hides other stuff

On my jsf page at some point I send a message to the growl component.
<p:growl id="growlLong" for="growlLong" showDetail="true" life="10000" sticky="false"/>
Once the 10sec is over, or dismissed by clicking the X, the issue that occurs is the element below the growl is not selectable. By inspecting the components on the page, looks like the actual div stayed there and blocks the content below it.
<?xml version="1.0" encoding="UTF-8"?>
<div class="ui-growl-item">
<div class="ui-growl-icon-close ui-icon ui-icon-closethick" style="display: none;" />
<span class="ui-growl-image ui-growl-image-info" />
<div class="ui-growl-message">
<span class="ui-growl-title">Success!</span>
<p>Configuration successfully saved.</p>
</div>
<div style="clear: both;" />
</div>
So, the question is - how do I make this to go away and keep the content below still usable?
Here is the screenshot of the issue, as seen with "inspect element", blue boxes are existing links, red box is the dismissed growl. Inside the blue box, we can't click the part that is covered by the red box.
This topic might be older but I just recently stumbled upon it:
The reason that the showcase is working but my version was not was that I gave the .ui-growl CSS class a height AND a width. In the showcase, the size of the container is only defined by its content and thus 0 if there are no items to display.
I moved my height definition to .ui-growl-item (which is more appropriate anyhow) and now it's working like a charm.
While the it would be desirable to be able to tell growl to not leave behind the <div id="growlLong_container"> structure in the DOM, the simple solution is to just select it and remove it using your favorite method to manipulate the DOM.
The ID appears to be the ID you passed to growl: id="growlLong" + "_container". With a DOM ID it is a simple matter of selecting it and removing it.
Yes, it would be nice to be able to get growl to not leave it in there. However, there is a point of diminishing returns vs. the amount of effort you spend trying to find a solution. It appears to be well past the point where you should just use a hack and remove it. Make a note about it, move on. Leave a comment in the code that this is why you are making the DOM manipulation. A possibility rather than removing teh <div> is to adjust the z-index such that it is below that of the UI elements. Another possibility is to add display:none; to the style. Obviously, code it such that if the <div> is not there nothing goes wrong. Verify that the next use of growl still performs correctly.
Ask on the Growl discussion group. Submit it as a bug with growl. If a way surfaces to make growl not leave something like this in the DOM revisit the code and apply it.
As to removing it, if you have JavaScript available it is as simple as:
document.getElementById("growlLong_container").remove();
To be more specific we really need more information about your code and the environment in which you are running.
A "solution" that should remove the <div>:
Hopefully you will receive an answer which allows the elements to be hidden/removed by growl. However, there does not appear to be one at present.
The following script should wait around checking every 250ms to see if the <div id="growlLong_container"> has been entered into the DOM. Once the <div> has been entered into the DOM, the script will wait 10s. If the <div> exists after the 10s it will be removed. The script is a hack. But it should work.
You will need to place it such that it makes it onto the page, either enclosed in tags (as are here), or in a file without the first line:<script class="code" type="text/javascript"> and the last line: </script> removed. If you use a separate file you will need to have it included in a similar manner as you do jquery.js, foundation.js, foundation.topbar.js and foundation.tooltip.js.
<script class="code" type="text/javascript">
(function () {
"use strict";
const maxGrowlTime = 10000; //In milliseconds
const checkFrequency = 250; //In milliseconds
var intervalTimer=0;
var foundGrowl=false;
function dismissGrowl() {
var growlId;
growlId = document.getElementById("growlLong_container");
if(growlId) {
growlId.parentNode.removeChild(growlId);
}
foundGrowl=false;
setGrowlCheckInterval();
}
function checkForGrowl() {
var growlId;
if(foundGrowl) {
return;
}
growlId = document.getElementById("growlLong_container");
if(growlId) {
foundGrowl=true;
clearInterval(intervalTimer);
setTimeout(dismissGrowl, maxGrowlTime );
}
}
function setGrowlCheckInterval() {
intervalTimer = setInterval(checkForGrowl, checkFrequency );
}
setGrowlCheckInterval();
})();
</script>
My hope is that you find an answer that does not require a hack such as this. However, this should solve your problem, at least to an extent. With the script, the prevention of using those controls will last for at least the entire 10s up to 10.25s even if the user dismisses the growl early. With the two screenshots mentioned in the comments it would probably be possible to change the script such that it detects if the user dismisses the grow and then remove the <div> immediately. This would make it more responsive to user input.
This solution assumes that the <div id="growlLong_container"> does not exist in the DOM prior to your issuing the <p:growl id="growlLong" and that it is not needed afterwards. This is very likely because the ID of the dive appears to be composed of the ID you pass the growl.
Mainly, this issue looks like a bug or incompatibility issue between components.

Email Newsletter - Layout goes weird in gmail

I created a newsletter for MailChimp. All looks good when I put it in the mailchimp website.
However when I send a test email to myself, some parts of the layout goes wrong in Gmail.
Some content goes too left and other content goes too right. While the header and some parts at the bottom is in the right place. It looks fine on my Phone.
My Newsletter Code: http://pastie.org/private/cwvosox7nzezqif7r3myoq
A picture of the problem i'm getting in Gmail: http://imgur.com/MPJNJq2
Can you please help me fix how its displayed in Gmail?
Thank you!
It looks like the width of your page is bigger than gmail's div ...
Or kleinfreund idea also have a good point.
edit: I just re-read, looks like this is your own html. Your display problems are directly related to the styles being in your head.
The code that you are displaying above has everything in the head, gmail does not recognize anything between the <head></head> tags.
You will need to move all of your css into inline styles

How can I hide certain text from search engines?

In my WordPress blog, I have "Posted ? days ago" on every post. I have 10 posts on my homepage. So according to most keyword analysis tools, "days ago" is a top keyword on my blog, but I don't want it to be. How can I hide those words from search engines?
I don't want to use Javascript. I can easily use PHP and the $_SERVER variable, but I'm afraid I might get penalized for cloaking. Is there a HTML tag or an attribute like rel="nofollow" that I can use?
From Is there any way to have search engines not index a certain section of a page?
Supposedly you can add the class
robots-nocontent to elements on your
page, like this:
<div class="robots-nocontent">
<p>Ignore this stuff.</p>
</div>
Yahoo respects this, though I
don't know if other search engines
respect this. It appears Google is
not supporting this at this time.
I suspect if you load your content via
ajax you would get the same effect of
it not being present on the page.
and
There's no general way to do that and
personally I wouldn't bother with it.
Search engines are pretty good at
recognizing relevant content on a
page, and even though that content
might show up in the keywords that
search engines have found, it doesn't
mean that it would make the page
relevant for those keywords.
If you have a page about "Fish" and a
page about "Dogs" (that has the link
to the page about "Fish" somewhere in
the sidebar), search engines will
generally be able to recognize that
the page about "Fish" is much more
relevant for "Fish" than the page
about "Dogs" that mentions "Fish" in
the sidebar. It's possible that both
pages might be found at some point,
but generally given that mostly one
page from the site is shown in the
search results, that's not something
worth worrying about.
There's no need to be fancy with that,
and search engines are likely to just
get more confused if you try (eg if
you use JavaScript to hide the
content, you never know when search
engines will start to find that
content regardless). Similarly, using
iframes with robots.txt disallows or
AJAX will frequently degrade the
quality of your pages to users (slow
it down or make it less usable on a
variety of devices), so unless there
is a very, very strong & proven reason
that you need to do this, I would
strongly recommend not bothering with
it.
What I have found on wiki:
For Yandex:
<!--noindex-->Don't index this text.<!--/noindex-->
For Yahoo:
<div class="robots-nocontent">Don't index this text.</div>
For Google:
<!--googleoff: index--> Don't index this text.<!--googleon: index-->
Linksku, I'm fairly sure you shouldn't be worried about that particular piece of text. Our algorithms do a relatively good job detecting boilerplate text. As far as I can tell from your question, this text is boilerplate and we likely already know that.
As for detecting Googlebot and don't serving this text for it, you're right, that would be cloaking and you should never do it. In this case if you hide that text from us, we will also have a hard time detecting it's boilerplate and you would end up doing exactly what you're trying to avoid :)
I worked this out and posted it up at: http://www.scivillage.com/thread-2580.html
This should work, however more testing of it and feedback would be appreciated.
.x:before{
content:attr(title);
display:inline;
}
<ul>
<li><span class="x" title="Homepage"></span></li>
<li><span class="x" title="Contact" /></li>
</ul>
(I kept the class name short to reduce mark-up creep)
The search engines should ignore HTML tags with empty values when comes to looking for keywords, this should mean that it ignores what is written in the title attribute. (It assumes that the value is what's important, if it's empty then there is no point checking the attributes)
It was suggested that it's possible to negate having the closing tag in HTML5 due reduced strictness, however there is counter suggestions that end tags are still required.
I'd suggest not using it directly on a (anchor) tags since they can be used for sitemaps (using #), so it's means they would like have the Title spidered.
Although it is possible that it might assume any title content is there to inflate keywords through hidden elements, however I can not confirm this.
To exclude specific text from Google search results you can add data-nosnippet attribute.
https://developers.google.com/search/reference/robots_meta_tag#data-nosnippet-attr
From google documentation
You can also prevent certain parts of the page text content from being shown in a snippet by using data-nosnippet.
HTML:
<div class="hasHiddenText">_</div>
It is important that you leave a non-whitespace character between the element with a hidden text.
External CSS:
.hasHiddenText{
content: "Your hidden text here...";
/*This ovewrites the default content of the div but it isn't supported by all browsers.*/
}
.hasHiddenText::before{
content: " Your hidden text here...";
/*Places a hidden text above the div.*/
}
The "hidden text" pertains to content hidden to all search engines but visible to visitors.
You can also use nextline and all sorts of Unicode characters by escaping them with \uXXXX. To display linebreak characters correctly, be sure to add the
white-space:pre-line;
property.

Web accessibility and h1-h6 headings - must all content be under these tags?

At the top of many pages in our web application we have error messages and notifications, 'Save' and other buttons, and then our h1 tag with the content title. When making a web application accessible, is it ever acceptable to have content above the top-level structure tag like we do here?
As a screen reader user I don't like content above the main heading. Normally I navigate by headings so would miss the error message. A better solution is to output an h1 heading above the error message, then leave the rest of your headings in tact giving you two h1 headings.
Yes (you can put stuff above them). The H simply means Heading. It's a question of what the heading relates to I guess.
My only caveat is, H2 shouldn't really be above H1, and H3 Shouldn't be above H2. But I don't think it's an actual rule.Websites have menus, warning, notifications. It's acceptable to put them above the rest of your content. I don't see how it would affect accessibility as long as your content is ordered logically. Look at the page CSS turned off. Does it look logical? That's the most important part of accessibility.
Although some people do go that extra mile and have the menu as the last item in the markup and use CSS to bring it back to the top. Personally, I find that solution counter productive. The menu is still important, it belongs at the top of the page.
Yes, just consider it is in that order that the user will get the information. So, if you just did an operation it sounds like a good idea to get any message related to it as the first thing. If it is a notification that appears on any page unrelated to what you are doing, I wouldn't put it above, as it might be a little weird.
Also you can use a text browser that doesn't use styles, it should look like a document with appropriate headers.
Heading tags are used to indicate the hierarchy of the content below it. You should only have one h1 tag and it should be the first content to appear on your page (this is usually the name of the site). Also, you shouldn't skip heading tags when drilling down through different tiers of content.
In your case, you can still use CSS to position items above the h1 tag as long as it is in the correct order in the html.
I assume the elements above the heading are used by JavaScript. In that case, it's preferable if they are created by JavaScript, not included in the source of the page.
To return to your original question, it is probably best that they be at the foot of the page. However, if they are hidden using the CSS "display: none;" or "visibility: hidden;" properties then they will not be seen by most (perhaps all?) screenreaders or by many other assistive technologies, and so should not be an issue. I've written a fairly detailed explanation of why accessibility technology ignores such elements.
Of course if somebody disables CSS things are going to look pretty messy. If there is content on the page that can be used even when CSS and/or JavaScript are disabled, then putting those elements at the bottom of the page will at least make things less cluttered.

Resources