Requesting removal of stored website data from search engines - web

Greetings fellow developers,
I would like to ask for help regarding the following problem: Is there a way to request removal of stored website data from search engines? Most of the links that show up when searching my domain are old and non-existent.
What I've found from personal research regarding this question/problem:
From my personal research I have found that removal requests can be made individually to the well-known search engines such as Google, Yahoo and Bing, but this is not what I am looking for, since I am well-aware that it would take a lot of time for the requests to be processed and the removal of the data to be done. Also, I wasn't able to find this "removal-request" webpage for the other search engines.
To be more precise/clear...
... I want to request this website-data-removal to all (most) search engines at once, so that when I upload my new website (to the same domain), working and functional links (URLs) would be displayed. Can this be anyhow achieved and, if so, how? Also, how much time would it take for this removal to be finished?
Hope my question is clear enough, and any answer/help would be very much appreciated.

No, there is not a way to do this for all search engines at once. You will have to request it from each site individually. As for the smaller search engines you can try and find any contact information or customer support however their is a chance they will ignore your request (heck, some sites ignore the robot.txt file and just search your site anyways... it's just a part of being on the web).

Related

Is it possible to prevent modifications to the title of a Google Doc yet allow the contents of the Google Doc to be edited

My Problem
I want to be able to migrate my Google Docs to a regular website while maintaining the links I had created between my Google Docs. Frequently I link one Google Doc to another Google Doc. As a result, I have created something that is similar to a wiki. For example let’s suppose I had created two Google Docs: Google Doc #1 and Google Doc #2.
Subsequently let’s suppose I had created a link (a hyperlink) in Google Doc #1 to Google Doc #2. Of course that's an extremely simple example. Let’s make it more complex. Imagine I had created a couple of thousand Google Docs with many links (hyperlinks) between them.
Of course backing up those Google Docs would be trivial either by using Google Takeout or rsync. However, what would happen if I wanted to move those Google Docs to a regular website? Then the myriad hyperlinks I had created would fail to point to the documents on my regular website.
That is, on my regular website, if I were to click on the link on the page which contained the contents which had been contained in Google Doc #1 (https://my_regular_website.com/google_doc_001) then instead of opening a link on my regular website to the page which contained the contents which had been contained in Google Doc #2 (https://my_regular_website.com/google_doc_002) , the link would point to the original Google Doc #2 (https://drive.google.com/drive/folders/google_doc_002)
My Technical Question
I read that, “You can use the 'contentRestrictions.readOnly' field on a `file' resource to lock a file and prevent modifications to the title, uploading a new revision, and addition of comments.” Source: Protect file content from modification
However, I would like to prevent modifications to the title file yet allow the contents of the file to be edited. For example, I might name a file something like, “1cn2OX4U67mY925GzG80hRBYjpqq2conSi9xgYikgwIM” which is the unique portion of a Google Docs URL.
That way, on my regular website, by using a simple regex, I could “relink” documents that pointed to https://docs.google.com/document/d/1cn2OX4U67mY925GzG80hRBYjpqq2conSi9xgYikgwIM
Final Thoughts
I like using Google Docs as, dare I say it, a word processor. Sometimes I use Google Docs to write essays. Sometimes I use Google Docs to create documentation. Sometimes I use Google Docs to collaborate with others (instead of emailing). Furthermore, I often use Google Docs’ outline format, styles, and voice typing.
Sure, I suppose I could use an actual wiki. But although I’ve tried many different wikis over the years, I never enjoyed using them. I found them to be clunky and overly simplistic. Furthermore, I didn’t enjoy installing them and needing to back them up. At this point In time, I don't want to have to install and maintain any software on a VPS (virtual private server).
I checked the documentation you are referring to and what you are trying to achieve is not possible, making a document read only will prevent a new revision of the file to be created.
For instance that won't allow you to change comments, content and title. At this time it is not possible to prevent some modifications, just all or none.
Regards.

Is it possible to find all pages of a website? Like the "unlisted" YouTube links

I've created a private platform based on the YouTube "unlisted" method. As long as you have the link, everyone can access that content.
But now I have the following questions:
Is it possible for someone to find all the pages of my website? Because if that is possible, the whole private thing will just not work.
If it is possible, how come no one managed to get the unlisted links from youtube as well? Those are basically pages, after all, right?
These questions are based on the fact that the same way a search engine can display different pages of a website (which can only do if those pages exist), the same way the person who created the search engine can use the same method to just find the unlisted link.
Maybe I am missing something, but this is my base theory and based on that I've asked the above questions. Hopefully, someone can help have a deeper understanding of how "unlisted" links work.

Get result of search of website, without actually "opening" the site

I'd like to code a website where you can find search results from many websites.
So my question is, if this scenario is possible and if yes, if you guys have any suggestions how I would be able to do this.
Here my workflow:
I search for something on my website. For example: "asdf"
My code then executes the search from the other website. for example:
https://www.google.ch/#q=asdf&safe=images
Then there will be shown some results, of course. But how can I directly take the results and show them on my website, without opening the other website?
I have to say, that the websites I'm looking for, haven't got any API for that.
I probably wouldn't recommend to scrap a web page directly in the client.
I'm not even sure if you can achieve it easily without getting some Cross Domain Policy problems anyway.
A solution like APIfy might help you doing what you want:
http://apify.heroku.com/resources
Or you can still make your own server site API "layer" for this particular website?
Keeping in mind that scrapping a web page is always a fragile process where the format can chance at any moment.

Add search feature to simple website without mySQL database

I have a simple HTML site with 100+ pages or so. I want to add a search bar at the top so the user can search the site. I know about Google Custom Search, but it shows ads unless you pay at least $100. Obviously I'd like ad-less search on my site for free if at all possible!
I've also heard about Lucene/Solr, but they do not actually crawl the site. For that I would apparently need Nutch.
Anyway, the site I have runs on a Microsoft IIS6 server, but I have basically no knowledge as to how Solr, Nutch, etc. gets "installed" on the server.
Also: I'd like to point out that I do have a local copy of the site. Perhaps I can do one big initial nutch "crawl" locally that will create an .xml for Solr?? That would help me get "up and running", but probably wouldn't be a good long-term solution.
..so should I just use Google Custom Search? or is there a not-extremely-painful-to-implement alternative? The brain hurts folks.
You did not mention how many search requests you want to handle but if you use the json-rest-api of google's custom search you have 100 searchqueries a day for free and you can display them without any ads on your page.
An simple example request can be found here.
Here is an easy way that works pretty well, although you may be looking for something more than this.
http://sitecomber.com/getsitecomber/
You can create code to paste into your site in about 2 minutes. It doesn't get easier than that. Search is powered by Google, but results are isolated to your website.
EDIT: This no longer works.

how to make my site look like in following image on search engines result

I was wondering if is it possible to make same thing by myself, or Search Engines does that by themselves?
I want to add some links like here:
Google does this on it's own and all you can do is to (then) remove some of the links through the Google Webmaster Tools.
They are commonly named Site Links and you can google for "How to get Site Links Google SERP" and so forth - there are thousands of tips for helping Google along.
A clear navigational structure and internal link structure help of course, and consistent anchor texts.
As far as I know, Google automagically pics those up - there is no direct way to set them.
Make sure you have a proper site map, and then wait I suppose.
Yes, google will generate that links for you.

Resources