How to crawl a website based on algolia-AJAX? - python-3.x

I am trying to crawl the listings on this website via scrapy: https://www.hipflat.com/search/rent/condo_y/TH.BM_r1/any_r2/any_p/any_b/any_a/any_w/any_i/100.560155,13.737171_c/16_z/list_v
However, I am stuck with the navigation. At the bottom of the page the links for "next page" show up. But as far as I can see it, they call an external site (algolia) via a JavaScrip-Query.
What would be the easiest way to make the navigation crawlable via scrapy?

The next page link is present in the page. You can get it using response.css("[rel='next']" ::attr("href")). This will provide you the next link for pagination. Now you can simply proceed with GET request using response.follow(url=,callback=).

Related

Is it possible to change proxy without restarting Selenium webdriver?

I am trying to scrape a website that renders all the date with JS. It pages of tables, however you can only access certain page via search box or by clicking an arrow to move to next page. It is impossible to access certain page by url.
I need to change proxy on each page. If I reload webdriver, I must execute all the searches to access e.g. 124102nd page. It is very time as well as computationally intensive.
Anyone could help me on this?

Normal link redirects to friendly URL

Is it possible in any way to treat
link1
as
prodcuct/mobile/android/xy in address bar. I mean when click on the 'link1' will show the SEO friendly url in address bar.
Thanks in advance
It's impossible because your product create the extension from a PHP database and if list it as "prodcuct/mobile/android/xy" Google Bot find crawl error page does not found in your google webmaster tool.
So my suggestion is write php code that when ever new product page is created using your backend; make product name to create automatically as page name. Then your site can links can be index really fast.

Home page is not displaying as the first term in google search

We have enable our website in Google Analytics and we have also used meta tags inside our jsp file.
Our website home page is not showing as first search term in Google and some sub pages are showed.
How can we get our homepage to show instead?
Use a sitemap.
You can find more info here: Sitemaps.org.
In a sitemap, you can set the priority of each page, so your homepage will appear above your other pages.
Keep in mind, though, you also have to have actual, relevant content on your homepage for it to show above others.
Their can multiple reasons. As you are saying, only homepage isn't showing. Then the mostly likely issue comes out is that it isn't indexed.
Another reason may be your is password-protected
OR
Your page has "no index" tags

If a page is not linked to the main website, can search engines find it?

I want to put a secret page in my website (www.mywebsite.com). The page URL is "www.mywebsite.com/mysecretpage".
If there is no clickable link to this secret page in the home page (www.mywebsite.com), can search engines still find it?
If you want to hide from a web crawler: http://www.robotstxt.org/robotstxt.html
A web crawler collects links, and looks them up. So if your not linking to the site, and no one else is, the site won't be found on any search engine.
But you can't be sure, that someone looking for your page won't find it. If you want secret data, you should use a script of some kind, to grant access to those, who shall get access.
Here is a more useful link : http://www.seomoz.org/blog/12-ways-to-keep-your-content-hidden-from-the-search-engines
No. A web spider crawls based on links from previous pages. If no page is linking it, search engine wouldn't be able to find it.

Keep website url constant when navigating to another page?

I want my site address bar not to change its address when I go to subpages, it should show my index.html, even though I enter tosub pages.
Like if I open www.xyz.com and I navigate to any page it should still show www.xyz.com.
I heard this can be done with .htaccess is it possible?
You really should think about why you want it, because this way of working has a couple of drawbacks with it:
Users can't see they are on a different page
Users can't bookmark your pages for fast access
Users can't share links to eachother
Search Engines may have trouble spidering your side
But basically, there are two main ways to do this:
Use frames. Put the page into a frame, and have all the links stay in this frame.
Use Javascript. Have each page "load" into the current page, using AJAX.

Resources