Excluded Crawled - currently not indexed - search

I have been going through this issue from last past of days. Some of the site's pages are not indexing in Google.
For instance, I have been working on this site https://tradefills.com and most of the pages are excluded from the indexing. I have checked each page and could not find any issue.
I have some other sites as well and facing the same issue in other one as well.

i got the same problem i've submit the url almost 1 month ago but still not indexed. i found the docs said
and we cannot predict or guarantee when these URLs will be displayed.

Related

catalog search page not working magento 2

Hi when i search on my magento 2 site if product didn't exists its working fine and showing this message
Your search returned no results.
but when the product exists in system its showing page like this
can any one please tell me whats the issue with this i have already checked exception and log files nothing is there
i found the solution my theme was creating this issue in this file
app/design/frontend/Meigee/versatile/Magento_Catalog/templates/product/list/toolbar/filter.phtml
hope it may helpful for others as well.

Incremental crawling in Nutch

I'm new to Nutch and am doing a POC with Nutch 1.9. I am only trying to crawl my own site to set up a search on it. I find that the first crawl I do only crawls one page. The second crawls 40 pages, the third 300. the increments reduce and it crawls around 400 pages overall. Does anyone know why it doesn't just do the full crawl of the website on the first run? I used the nutch tutorial (http://wiki.apache.org/nutch/NutchTutorial) and am running using the script as per section 3.5.
I'm also finding with multiple runs it doesn't crawl the whole site anyway - GSA brings back over 900 pages for the same site - nutch brings back 400.
Thanks kindly
Jason
Upto my knowledge,
Nutch crawl the known links and getting inlinks and outlinks from the known pages then add those links into db for next crawl. It seems why nutch didn't crawl all pages at single run.
Incremental crawling means to crawl only new or updated pages and leaves the unmodified pages.
Nutch cralws only limited page because of your configuration settings. change it to crawl all pages. See here
If you want to make a search for one website, then take a look at Aperture. It will crawl whole website at single run. It provides incremental support.
Why don't you use the Nutch mailing list? you'd get a larger audience and quicker answers from fellow Nutch users.
What value are you setting for the number of rounds when using the crawl script? Setting it to 1 means that you won't go further than the URLs in the seed list. Use a large value to crawl the whole site in a single call to the script.
The difference in the total number of URLs could be the max oulinks per page param as Kumar suggested but it could also be due to the URL filtering.

sharepoint 2010 search not returning results

I have searched unsuccessfully for a number of days to try and resolve an issue with the search not returning results in Sharepoint 2010. Basically the search has successfully crawled the content and indexed the results but the search is returning no results on our site.
The codebase is the same as other servers that does return results so we are confident it's not a coding issue, but is a sharepoint issue. We can reach the search queryex webservice also
I was wondering if anyone had any suggestions on possible settings / things to check to try and kickstart this search!
This is my first question on stackoverflow, so please advise if I haven't added enough detail.
I'm fairly new to SO myself, but I'll help if I can! Quick couple of questions:
Is the search returning no results for all user types, including admin?
Is there an error returned when you submit a search or does it just say "no results"?
Are you using any custom web parts for displaying the search results, or is it OOB?
Do you have access to the ULS logs perchance? There may be further information there.
Are you using search scopes at all?
What domain account are you using for the search app pool? There's some info here about making sure you have the right type of identity - SharePoint 2010 search crawling but not displaying results
(Apologies for posting a comment-as-an-answer, but since my rep is below 50 I can't yet post comments on your question - still wanted to help though.)

google index - will google index my logs?

I have some txt log files where i print out some important activities for my site.
These files ARE NOT referenced from any link within my site, so it's only me i know the url
(they contain current date in the filname so i have one for each day).
Question: will google index these kind of files?
I think google indexes only the pages whom urls are on the site.
Can you confirm my assumption? I just do not want others to find the link from google etc:)
In theory they shouldn't. If they aren't linked from anywhere they shouldn't be able to find them. However I'm not sure if stuff can make its way into the index by virtue of having the google toolbar installed. Definitely I've had some unexpected stuff turn up in search engines. The only safe way would be to password protect the folder.
Google can not index pages that it doesn't know they exist, so it won't index these, unless someone posts the url's to google, or place them on some website.
If you want to be sure, just disallow indexing for the files (in /robots.txt).
Best practice is to use the robots.txt to prevent the google crawler from indexing files you don't want to show up.
This description from Google Webmaster Tools is very helpful and leads you through the process of creating such a file:
https://support.google.com/webmasters/answer/6062608
edit: As it was pointed out in the comments there is no guarantee that the robots.txt is used so password-protecting the folders is also a good idea.

Google Custom Search not indexing Dynamic Pages

I am trying to use Google Custom Search to provide search capabilities to an informational site.
About the site:
Content is generated dynamically
URL Access to content is search engine friendly (i.e. site.com/Info/3/4/45)
Sitemap (based on RSS feed) submitted
and accepted by web master tools. It
notes that no pages were indexed.
Annotations sucessfully submitted based on the RSS feed
Problem:
There are no results for any keywords that appear on the pages that were submitted.
Questions:
Why is Google not indexing the submitted pages?
What could I be doing wrong?
Custom Search with basic settings is principally same thing as standard search with site:your.website. Does standard search give you expected results?
Note, that Google doesn't index pages immediately. It takes some time. Check if your site is already indexed.
Yeah it took about 2 weeks for Google to pick up all my pages after I submitted a site map. But you should see a few pages indexed after a couple days.

Resources