Has anyone seen this? In Google's search results I'm seeing "It appears the URL has been modified. Restore" under a bunch of my pages but that text does not exist on the page or in Google's cached version.
If you search for "It appears the URL has been modified. Restore" there are a bunch of sites affected. Can I safely assume this is a Google error?
Related
After buying hosting and domain for my site, I uploaded the content.
Then I read about how Google finds and index websites and pages, I went to the Google Search Console and registered my website property successfully.
But in the "pages" section I see that the page (my website has only 1 page, index.html) is labelled as "not indexed", with the reason of "Excluded by ‘noindex’ tag".
After a quick search online, the solution seems to be to remove the "noIndex" tag from the page code: but in my code there isn't that tag.
I haven't found anyone with my same problem so far; what could be the reason for this?
I've got a small flask site for my old wow guild and I have been unsuccessful in getting google to read my sitemap.xml file. I was able to successful verify my site using googles Search Console and it seems to crawl it just fine but when I go to submit my sitemap, it lists the status as "Couldn't fetch". When I click on that for more info all it says is "Sitemap could not be read" (not helpful)
I originally used a sitemap generator website (forgot which one) to create the file and then added it to my route file like this:
#main.route('/sitemap.xml')
def static_from_root():
return send_from_directory(app.static_folder, request.path[1:])
If I navigated to www.mysite.us/sitemap.xml it would display the expected results but google was unable to fetch it.
I then changed things around and started using flask-sitemap to generate it like this:
#ext.register_generator
def index():
yield 'main.index', {}
This also works fine when I navigate directly to the file but google again does not like this.
I'm at a loss. There doesn't seem to but any way to get help from google on this and so far my interweb searches aren't turning up anything helpful.
For reference, here is the current sitemap link: www.renewedhope.us/sitemap.xml
I finally got it figured out. This seems to go against what google was advising but I submitted the sitemap as http://renewedhope.us/sitemap.xml and that finally worked.
From their documentation:
Use consistent, fully-qualified URLs. Google will crawl your URLs
exactly as listed. For instance, if your site is at
https://www.example.com/, don't specify a URL as https://example.com/
(missing www) or ./mypage.html (a relative URL).
I think that only applies to the sitemap document itself.
When submitting the sitemap to google, I tried...
http://www.renewedhope.us/sitemap.xml
https://www.renewedhope.us/sitemap.xml
https://renewedhope.us/sitemap.xml
The only format that they were able to fetch the sitemap from was:
http://renewedhope.us/sitemap.xml
Hope this information might help someone else facing the same issue :)
put this tag in your robots.txt file Sitemap: domainname.com/sitemap.xml. Hope this will be helpful.
If I go to this url
http://sppp.rajasthan.gov.in/robots.txt
I get
User-Agent: *
Disallow:
Allow: /
That means that crawlers are allowed to fully access the website and index everything, then why site:sppp.rajasthan.gov.in on google search shows me only a few pages, where it contains lots of documents including pdf files.
There could be a lot of reasons for that.
You don't need a robots.txt for blanket allowing crawling. Everything is allowed by default.
http://www.robotstxt.org/robotstxt.html doesn't allow blank Disallow lines:
Also, you may not have blank lines in a record, as they are used to delimit multiple records.
Check google webmasters tools to see if some pages have been dissallowed for crawling.
Submit a sitemap to google.
Use "Fetch as google" to see if google can even see the site properly.
Try manually submitting a link through the fetch as google interface.
Looking closer at it.
Google doesn't know how to navigate some of the links on the site. Specifically http://sppp.rajasthan.gov.in/bidlist.php the bottom navigation uses onclick javascript that gets dynamically loaded and it doesn't change the URL so google couldn't link to page 2 it even if it wanted to.
On the bidlist you can click into a bid list detailing the tender. These don't have public URLs. Google has no way of linking into them.
The PDFs I looked at were image scans in sanskrit put into PDF documents. While Google does OCR PDF documents (http://googlewebmastercentral.blogspot.sg/2011/09/pdfs-in-google-search-results.html) it's possibly they can't do it with sanskrit. You'd be more likely to fidn them if they contained proper text as opposed to images.
My original points remain though. Google should be able to find http://sppp.rajasthan.gov.in/sppp/upload/documents/5_GFAR.pdf which is on the http://sppp.rajasthan.gov.in/actrulesprocedures.php page. If you have a question about why a specific page might be missing, I'll try to answer it.
But basically the website does some bizarre non-standard things, this is exactly what you need a sitemap for. Contrary to popular belief sitemaps are not for SEO, it's for when google can't locate your pages.
I have a weird problem with Google when I search for my web address. The problem is when I type in Google search ex. mydomain.com, the results are from alldomain.com - not from my website, and Google give a link "Search instead for mydomain.com", and I need to click that link to give results for mydomain.com. What can I do to fix this problem?
Here is what I mean:
If that's a new website and you don't have a google-webmaster account - you should open one. It's important to register your website (to let Google know you have a new website) so that it'll start index it ASAP.
If it's an "old" website that suddenly stopped appearing on google search results - you can read the answer I posted here.
If you want to search for results from one website you should search for:
SEARCH STRING site:mydomain.com
Google Webmaster Tools is reporting 403 errors for some folders on the websites server for example:
http://www.philaletheians.co.uk/Study%20notes/
The folder isnt forbidden so dont understand why it would be 403 errors for Googles Crawler?
How come the Google Crawler is trying to browser the actual folders and not just going straight to the files in that folder? Is this somthing to do with robots.txt ?
Make sure is there any actual place or document to be present if some one request that url. I've browsed through your site and could not found a link that directs to http://www.philaletheians.co.uk/Study%20notes/
Also it seems, all the study notes are inside this "Study%20notes" directory.So actual this link will not work anyway. So check the google web master tools's link from to find where this broken link situate and cure it.
Have you set default document correctly in your web server? In apache, this comes in the DirectoryIndex setting (and defaults to index.html). Also, in general it might be better to strip off spaces etc.. from your traversable directory names (the %20 you are seeing between Study and notes is a url-encoded space character), so as to keep your URLs clean to your visitors and search engine bots.