Google is indexing my website's pages slowly, what is the problem? - google-index

How long will it take for Google to index whole pages, is it normal to take more than 2 months to index 75K out of 3.2M pages?
I added sitemap to google and set robot.txt more than 2 months ago, in this period Google bots indexed only 75 thousand pages out of 3.2 million, is it normal?
If there is a problem, how could I understand?

Related

Google Vision product search indexing

I have a question regarding Google Vision product search.
I know the Product Search index of products is updated approximately every 30 minutes. Does indexTime reset to default value "1970-01-01T00:00:00Z" on an unused ProductSet?
That means that it hasn't been indexed yet

searching YouTube for videos with specific range of views eg. between 9,000,000 and 11,000,000

first time posting.
I wanted to ask if anyone knows how I can search on YouTube for, let's say, music video's that have been viewed between a set number of times. Like the title says for example, between 9 and 11 million times.
One reason I want to do this is because I want to find good music that I haven't heard before. The logic I'm working on is that the Got Talent type video's that get viewed millions of times are generally viewed that many times for one of two reason. 1) they're amazing. 2) they're embarrassingly horrible.
And though I don't think a song being popular will necessarily mean I'll like it, I'm hoping this method will be successful to some degree.
Another reason is to look for trailers for independent films with a similar logic as above. Though with these movies I think I only hear about them six months to a year after they've been released because they're flying under the radar.
If I were to be able to search for movie trailers with 'x' number of views though.. for example, between 500,000 and a million, maybe I'd be able to find movies that I'll like quicker than via time passing and them getting mentioned to me by a friend.
Any help would be greatly appreciated as I've wanted to be able to perform these kind of searches for awhile now.
thanks
You will need to use YouTube API v3.
I havent written this exact request but it looks like you can list videos then filter by 'Chart' = 'mostPopular'
https://developers.google.com/youtube/v3/docs/videos/list
Perhaps a bit of background reading on the API would help too...
https://developers.google.com/youtube/v3/
First off, you would need the Youtube Data API. "v3" means nothing because it's simply the current version, like "Windows 10."
The API lets you get a video's view count, but doesn't put it in a range like 9 million to 11 million.
Youtube's own search function is pretty sophisticated. For instance,
https://www.youtube.com/results?search_query=movie+trailer&search_sort=video_view_count&filters=month. This gives all results for "movie trailer," within the last month, sorted by view count. You can customize the URL, i.e. "week" instead of month would return only trailers from the last week. Or year, etc. Essentially this is a "Videos: List: MostPopular" query, with subject filter.
I have a few Youtube API scripts, and I hardly think it's worth the hassle to do it that way when Youtube's advanced search get you 99% there. If you did, you would need to to a Search:list query for a given subject (i.e. "movie trailer"). Limited to a given time frame (i.e. last month). Then for each video ID, make a Videos:list query to get its view count. Then print all, sorted by views.

Counts of web search hits [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a set of search queries in the size of approx. 10 millions. The goal is to collect the number of hits returned by a search engine for all of them. For example, Google returns about 47,500,000 for the query "stackoverflow".
The problem is that:
1- Google API is limited to 100 query per day. This is far from being useful to my task since I would have to get lots of counts.
2- I used Bing API but it does not return an accurate number. Accureate in the sense of matching the number of hits shown in Bing UI. Has anyone came across this issue before?
3- Issuing search queries to a search engine and parsing the html is one solution but it results in CAPTCHA and does not scale to this number of queries.
All I care about is that the number of hits and I am open for any suggestion.
Well, I was really hoping that someone would answer this since this is something that I also was interested in finding out but since it doesn't look like anyone will I will throw in these suggestions.
You could set up a series of proxies that change their IP every 100 requests so that you can query google as seemingly different people (seems like a lot of work). Or you can download wikipedia and write something to parse the data there so that when you search a term you can see how many pages it falls in. Of course that is a much smaller dataset than the whole web but it should get you started. Another possible data source is the google n-grams data which you can download and parse to see how many books and pages the search terms fall in. Maybe a combination of these methods could boost the accuracy on any given search term.
Certainly none of these methods are as good as if you could just get the google page counts directly but understandably that is data they don't want to give out for free.
I see this is a very old question but I was trying to do the same thing which brought me here. I'll add some info and my progress to date:
Firstly, the reason you get an estimate that can change wildly is because search engines use probabilistic algorithms to calculate relevance. This means that during a query they do not need to examine all possible matches in order to calculate the top N hits by relevance with a fair degree of confidence. That means that when the search concludes, for a large result set, the search engine actually doesn't know the total number of hits. It has seen a representative sample though, and it can use some statistics about the terms used in your query to set an upper limit on the possible number of hits. That's why you only get an estimate for large result sets. Running the query in such a way that you got an exact count would be much more computationally intensive.
The best I've been able to achieve is to refine the estimate by tricking the search engine into looking at more results. To do this you need to go to page 2 of the results and then modify the 'first' parameter in the URL to go way higher. Doing this may allow you to find the end of the result set (this worked for me last year I'm sure although today it only worked up to the first few thousand). Even if it doesn't allow you to get to the end of the result set you will see that the estimate gets better as the query engine considers more hits.
I found Bing slightly easier to use in the above way - but I was still unable to get an exact count for the site I was considering. Google seems to be actively preventing this use of their engine which isn't that surprising. Bing also seems to hit limits although they looked more like defects.
For my use case I was able to get both search engines to fairly similar estimates (148k for Bing, 149k for Google) using the above technique. The highest hit count I was able to get from Google was 323 whereas Bing went up to 700 - both wildly inaccurate but not surprising since this is not their intended use of the product.
If you want to do it for your own site you can use the search engine's webmaster tools to view indexed page count. For other sites I think you'd need to use the search engine API (at some cost).

automating RAPIDSHARE piracy file take down process

I found a new search engine that speeds up finding piracy files from rapidshare, how could I automate a tool that finds our product using this engine and outputs the list of the rapidshare URLs that will be then sent to abuse#rapidshare.com.
search engine:
http://rapidlibrary.com/
(note, the captcha image appears just once there)
Below is a nice script that could perhaps do this pretty easily?
http://www.nasser.me/ubiquity/rapidsharecom-link-checker/
I thought about this in the past and being a "tv show pirate" myself it kinda annoys me why free torrent sites like The Pirate Bay and Mininova are being taken down while other not so free sites like Rapidshare, Megaupload and so on host the files and continue to make millions out of piracy.
The marketing model of those sites is viral, meaning the more a user spreads his link the more points he will receive and the less he will have to pay for his "subscription" in the future so is just obvious to suppose that those same links would be well spread over the Internet.
I would just search and scrap all the major warez forums out there, for a week or two and after that a search on the web should find all the remaining blogs / sites that still point to the pirated file.

Does PageRank mean anything?

Is it a measure of anything that a developer or even manager can look at and get meaning from? I know at one time, it was all about the 7, 8, 9, and 10 PageRank. But is it still a valid measure of anything? If so, what can you learn from a PageRank?
Note that I'm assuming that you have other measurements that you can analyze.
PageRank is specific to Google and is a trademarked proprietary algorithm.
There are many variables in the formulas used by Google, but PageRank is primarily affected by the number of links pointing to the page, the number of internal links pointing to the page within the site and the number of pages in the site.
Thing you must consider is it's specific to a web page, not to a web site. So you need to optimize every pages.
Google sends Googlebot, its indexing robot, to spider your website, the bot is instructed not to crawl your site too deep unless it has a reasonable amount of PR (PageRank).
As to what I have experienced, the pagerank is an indicator for how many sites recently linked to your site. But it is not necessarily connected to your position on Google for example.
There were times where we increased our marketing and other sites linked to us, and the pagerank rose a bit.
I think the factors resulting in any SERP position are changing too much to put all your faith into one. Pagerank was very important, and still is to some degree but how much is a question I can't answer.
Every link you send out on a page passes some of the page's pagerank to where the link is pointing. The more links, the less pagerank passed on to each. Use rel="nofollow" in your links to focus pagerank flow in a more controlled manner.
The page rank algorithm is the probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. It is a relatively good approximation of importance of a webpage.

Resources