How to parse object as HTML using mechanize?

How to parse object as HTML using mechanize? - python-3.x

I am very much new to mechanize and in fact python too. I am trying to write script that auto fill ups form with my custom data and after a long search over web I landed to mechanize page and found something I was looking for.
I am able to retrieve the web page in which I need to auto fill the fields. But the web page that I retrieved is not in clean html format. It contains stuffs like \n like
\n<input type="text" name="jcaptcha" value="" id="appEntry_jcaptcha" style="width: 240px;"/></div> </div>\n </div>\n\n <div>\n <input type="hidden" name="statusType" value="NEWLICENSE" id="appEntry_statusType"/>\n <div align="center" id="wwctrl_confirmBox">
I got this from the code
import mechanize
br= mechanize.Browser()
dotm = 'http://example.com'
br.set_handle_robots(False)
br.open(dotm)
br.select_form(nr=0)
br['someField'] = ['value']
br['someField'] = ['someValue']
response = br.submit()
print(response.read())
I got this web page after doing br.submit() from some previous page. The actual form that I wanted to auto fill is actually on 'response'. So, how can I render the messy stuff containing \n to a clean HTML so that I can select input fields over there and auto fill with my custom data? It would be great if you showed some example of selecting fields over there and auto filling them.

Related

Multiple conflicting route parameter formats in Razor Pages

I'm making simple crud forms based on the tutorials for Razor Pages MVVM - https://learn.microsoft.com/en-us/aspnet/core/tutorials/razor-pages/?view=aspnetcore-2.1
The issue is the elements on the Index page use different formats for the route parameter and I end up with URL's like /StockIndexMonths/2?StockIndexId=1
Where both /2 and StockIndexId=1 are the same parameter
The select list will use ?StockIndexId=1
The Create New link will use /1, when returning to the Index /1 is used
If I use the select list again I get both /1?StockIndexId=2
Can anyone tell me the preferred way to force the same parameter format to be used? I'm trying to keep Razor Pages doing it's 'magic'
Index.cshtml
#page "{StockIndexId?}"
#model Investments.Pages.StockIndexMonths.IndexModel
#{
ViewData["Title"] = "Index";
}
<h2>Index</h2>
<form>
<select asp-for="StockIndexId" asp-items="Model.StockIndexNameSelect" onchange="this.form.submit();"></select>
</form>
<a asp-page="Create" asp-route-StockIndexId="#Model.StockIndexId">Create New</a>
<table class="table">
...

Alter the form tag so that it uses the POST method:
<form method="post">
Currently, because the method is not specified, GET is used by default, which appends forms values to the URL as query string values. That's why you see what you are seeing.

Value doesn't show up when retrieved using requests.get

I'm using Python to scrape a retailer website for its html. I looking for the data and attribute on their air conditioning products, such as Energy Efficiency, Constant or Variable Type, etc. etc. Hence, I used requests.get() and afterwards I plan to filter the data using regex or bs4.
file_number = 0
for portal in portals:
item = requests.get(portal)
item_text = str(item.text)
file_number += 1
file_name = "blah" + file_number.zfill(4) + ".txt"
file = open(file_name,"w",encoding="utf8")
file.write(item_text)
file.close()
I could retrieve all html pages from the set() I've compiled. However, the product price is missing. This piece of information is present if I go the page and directly right-click --> inspect.
The example below is just one instance of the differences. The two files are the same, except all references to the prices are omitted (Just a wild guess: the price could appear slightly differently depending on who's shopping, that's why there're stored separately somehow.)
Also be glad to listen to any suggestion on code improvement, I'm brand new to python!
requests.get() version of info
<div class="p-price">
<strong class="J-p-32965125681"></strong> <span>X <span class="J-buy-num"></span></span>
</div>
vs
right-click --> inspect version of info
<div class="p-price">
<strong class="J-p-32965125681">￥3499.00</strong> <span>X <span class="J-buy-num"></span></span>
</div>
Thank you so much!
By the way, disclaimer the robots.txt says:
User-agent: *
Disallow: /?*
And I'm not crawling any page that have "?" in their url so...

Web Scraping is tricky!
At a first glance, the values seens to be added via javascript. In that case, you'll need to use a headless browser or extension to scrap the DOM after the page concludes loading, not the skeleton html page on the site.

Trouble handling hidden element with selenium

Every time I run this code, I get an issue reaching to the targeted page. The site requires post request parameter to be filled in to reach the page where I am after. However, using get request it was good to go until it hits "Var4" parameter within my code. Inspecting element I could see that it indicates as hidden. If i left the hidden parameter blank then it redirects to another location. So, satisfying this thing to get to the targeted page is beyond my capability. Any suggestion will be appreciated.
from selenium import webdriver
driver = webdriver.Chrome(r"C:\Users\ar\Desktop\Chromedriver\chromedriver.exe")
driver.get('https://www.infocomm.org/cps/rde/xchg/infocomm/hs.xsl/memberdirectory.htm')
Var1='Professional Services Providers'
Var2='AUSTRALIA'
Var3='0'
Var4='1'
driver.find_element_by_xpath('//select[#name="mas_type"]').send_keys(Var1)
driver.find_element_by_xpath('//select[#name="mas_cntr"]').send_keys(Var2)
driver.find_element_by_xpath('//input[#name="OtherCriteria"]').send_keys(Var3)
driver.find_element_by_xpath('//input[#name="DoMemberSearch"]').send_keys(Var4)
driver.find_element_by_xpath('//input[#type="submit"]').click()
Element for the hidden stuffs which should be applicable for "Var4":
<form name="searchform" id="searchform" action="memberdirectory.htm" method="post" onsubmit="return Checkform();">
<input type="hidden" id="DoMemberSearch" name="DoMemberSearch" value="1">
<div class="login block-type-a block">

As workaround, you can try execute javascript with selenium.
For example, to unhide element
driver.execute_script("document.getElementById('DoMemberSearch').type = 'text';")
or set value directly
driver.execute_script("document.getElementById('DoMemberSearch').value = '%s';" % Var4)

you could not sendkeys to a hidden element, what you can do is to use javascript to send the value
probably something like this
driver.execute_script("document.getElementById('DoMemberSearch').value='1')

Dynamic database search result on a website

This question regards website programming. My primary languages are c++/c# and I don't know much about web development, except that say I understand html and css. That's why I'm looking for a relatively simple solution, preferably something out of the box. I don't have any experience with JavaScript, but for the sake of this project I'm willing to learn it if necessary.
Let's say I have a database, where each entry is about a book. It contains the fields: title, author(s) and publication date.
I would like to create a simple website with a search box that has this dynamic result feature, so that you get suggestions after you type in a few letters. All those suggestions, as well as search results, need to be based purely on the database.
This could be a static website or based on any Content Management System, I'm familiar with Joomla, but was unable to find an out-of-the box component that would do just that. All those search modules search the entire website and I only need to search the database.

Probably I can help you with how to implement this feature. We used to call this feature as autocomplete menu.
First you decide minimum characters to populate autocomplete menu. For example 2
By using javascript you write keyup event. Once the characters count reaches to the minimum character count. You send AJAX request to the server.
The server should process this request and do the database search and form a json or xml response or plain text to the client.
The client parses that response into javascript object and construct a dynamic html for autocomplete menu with the data and render it into the DOM hence display's just below the search text box.
Now if you want to display the first result inside the textbox as you type. Here is the method I can suggest as similar as google search box
Place one label or span just below the input text box. By using css make its position exactly match with the position of the text box. Make sure the starting of text of both input text and label matches. Make the text color of the label less brighter than the font color of text. The font size and font family of the label should match the input text field's style. Now by using Javascript display the first or most matched text inside the label. Please find the sample code below
<!DOCTYPE html>
<head>
<style>
label {
position:relative;
}
body, input {
font-family: 'verdana';
font-size: 12px;
}
</style>
</head>
<html>
<body>
<form action="demo_form.asp">
<input type="text" name="fname" placeholder="Matched Text" /><br>
<label> Matched Text</label><br>
</form>
</body>
</html>

How to extract the hover box information in watir and print it to a STDOUT

I have to print the hover box information content on to stdout and i tried it in the below fashion it didn't work for me .
data = $browser.div(:class => "homeSectionLabel textWidget",:text => /Pool A/ ).hover
print "Data #{data} \n"
And the other problem that i have other widget called Pool B with same class name . How to access that hover information
<div class="widgetContainer poolContainer">
<div class="healthBadge healthUnknown" style="top: -5px; left: -5px;"></div>
<div class="homeSectionLabel textWidget">Pool‌·A</div>
<div class="perfDisplay homePoolPerf">
</div>
<div class="homePoolVolText textWidget">9‌·Volumes,‌·0‌·Snapshots</div>
<div class="spaceMeterContainer poolMeter" style="width: 265px; height: 20px;">
</div>
<table class="tableWidget homeTiers" cellspacing="0" cellpadding="0" border="0">
</table>
</div>
Anyhelp is really appreciated .
Thanks!
Aditya

This is not much of an answer at the moment, but what I have to say won't fit in a comment
The 'content' as in the text within a div is normally accessed with the .text method
'tooltip' text can be done in a number of ways, it could be via alt attributes, it could be via javascript triggered via an 'onmouseover' event, or it could be CSS driven usually via the :hover psuedoclass.
if a div is merely changing it's display property or location so that it becomes visible to the user, then all you need to do is figure out how to locate that div, and get the .text from it
mydata = browser.div(:how => 'what').text
If the content of the div (or some other container) is changing as a result of the mouseover/hover, then you need to simulate the action, wait a brief bit to allow client side code to run, and THEN get the .text from the container that was changed.
Without seeing a page that has the code working on it, it is hard to tell which is the case, although given that I see nothing like 'onmouseover' in the code you supplied, my first bet would be on this being CSS driven.
The code you have above is returning the result of the div object executing the .hover method, and that is going to be nil as far as I know since that method causes something to happen, but does NOT return a value.
Is the 'Pool A' the text you are trying to capture, or is it what you mouse_over to cause the other text to become visible to the user? If it is what you mouseover, then have you searched the HTML to see if you can find the text that appears in some other div?
If you just need to get the text from every div of a given class, then try something like this
browser.divs(:class => "homeSectionLabel textWidget").each do |div|
puts div.text
end

Based on the most recent comment, this will gather the class names from all of the divs on the page and print it to the console.
$browser.divs.each do |div|
puts div.class
end
Replace "puts div.class" with a file directive if you want it in a file. Any output here is simple Ruby.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to parse object as HTML using mechanize? - python-3.x

Related

Multiple conflicting route parameter formats in Razor Pages

Value doesn't show up when retrieved using requests.get

Trouble handling hidden element with selenium

Dynamic database search result on a website

How to extract the hover box information in watir and print it to a STDOUT

Categories

Resources