Number of Search Results from Excel - excel

Given a column of strings I would like to find the number of search results from a website (e.g. sciencedirect.com) for each string. An existing answer Number of Google Results from Excel works well for Google.
Unfortunately this solution creates the search URL from the string i.e. a Google search for example contains the word example. The sites I want to use do not do this.
A search for example returns the URL http://www.sciencedirect.com/science?_ob=ArticleListURL&_method=list&_ArticleListID=1860967815&_sort=r&_st=13&view=c&_acct=C000053194&_version=1&_urlVersion=0&_userid=1495569&md5=0ef30742e917da15236ef1824058a1db&searchtype=a
Any idea how I achived the same result with this type of search engine.

You'll have to examine the form you submit when you click on the Search button. The form sends a GET request which containes the search terms, but then you get redirected to the result list page, and the URL of the results page does not contain the search terms any more.
I have successfully searched for "corpus" with this URL:
http://www.sciencedirect.com/science?_ob=QuickSearchURL&_method=submitForm&_acct=C000228598&_origin=home&_zone=qSearch&md5=61ce8901b141d527683913a240486ac4&qs_all=corpus
Note that what you'll have to do is
download the start page http://www.sciencedirect.com/
extract hidden fields from the search form
assemble the search URL from the hidden fields
add your search term to the search URL in the qs_all field
send GET request with the search URL
follow redirection
Except for qs_all all other fields in this URL comes from the the form as hidden fields.
This is the source of the corresponding form, as I downloaded it (before sending the "corpus" search request.):
<form name="qkSrch" method="get" target="_top" action="/science" style="margin:0px;">
<input type="hidden" name="_ob" value="QuickSearchURL">
<input type="hidden" name="_method" value="submitForm">
<input type="hidden" name="_acct" value="C000228598">
<input type="hidden" name="_origin" value="home">
<input type="hidden" name="_zone" value="qSearch">
<input type="hidden" name="md5" value="61ce8901b141d527683913a240486ac4">
<table border="0" width="100%" cellpadding="0" cellspacing="0" style="margin: 0;">
<tbody><tr valign="middle">
<!-- Code related for toggling labels -->
<td align="right"><label for="qs_all" id="fieldLabel"> All fields</label></td>
<td align="left"><input class="textbox qsinput xpstyle" type="text" name="qs_all" id="qs_all" value="" size="30" maxlength="450" title="For example: heart attack AND behavior?" tabindex="1"></td>
<td align="right"><label for="qs_author"> Author</label></td>
<td align="left" colspan="5"><input class="textbox qsinput xpstyle" type="text" name="qs_author" id="qs_author" value="" size="33" maxlength="450" title="e.g. J S Smith or John Smith or Smith JS" tabindex="2" style="_width:100%"></td>
<td nowrap="nowrap">
</td><td></td><td></td>
<td align="right" nowrap="nowrap" width="90%" valign="middle">
Advanced search
</td>
</tr>
<tr>
<td align="right"><label for="qs_title"> Journal/Book title</label></td>
<td align="left"><input class="textbox qsinput xpstyle" type="text" id="qs_title" name="qs_title" value="" size="30" maxlength="450" title="For example: journal of molecular biology" tabindex="3"></td>
<td align="right" class="toggleQukSrch2"><label for="qs_vol" id="volField"> Volume</label></td>
<td align="left" class="toggleQukSrch"><input class="textbox qsinput xpstyle" type="text" name="qs_vol" id="qs_vol" value="" size="3" maxlength="10" style="width:30px;" tabindex="4"></td>
<td align="right" class="toggleQukSrch2"><label for="qs_issue" id="issueField"> Issue</label></td>
<td align="left" class="toggleQukSrch"><input class="textbox qsinput xpstyle" type="text" name="qs_issue" id="qs_issue" value="" size="3" maxlength="10" style="width:30px" tabindex="5"></td>
<td align="right" class="toggleQukSrch2"><label for="qs_pages" id="pageField"> Page</label></td>
<td align="right" class="toggleQukSrch"><input class="textbox qsinput xpstyle" type="text" name="qs_pages" id="qs_pages" value="" size="3" maxlength="10" title="For example: 14-27" style="width:30px" tabindex="6"></td>
<td align="right" nowrap="nowrap">
<input class="button" id="submit_search" type="Submit" alt="Submit Quick Search" title="Submit Quick Search" value="Search ScienceDirect" tabindex="8" name="sdSearch">
</td>
<td align="right" nowrap="nowrap" colspan="8" valign="bottom">
<a class="icon_qmarkHelpsci_dir" href="/science?_ob=HelpURL&_file=qs_tips.htm&_acct=C000228598&_version=1&_urlVersion=0&_userid=10&md5=2bd779305b31602341744eaa786e2f0a" target="sdhelp" onmouseover="window.status='Help is Available';return true" onmouseout="window.status='';return true" onclick="var helpWin;helpWin=window.open('/science?_ob=HelpURL&_file=qs_tips.htm&_acct=C000228598&_version=1&_urlVersion=0&_userid=10&md5=2bd779305b31602341744eaa786e2f0a','sdhelp','scrollbars=yes,resizable=yes,directories=no,toolbar=no,menubar=no,status=no,width=760,height=570');helpWin.focus();return false" tabindex="9" style="font-size:0.92em;padding-right:0;">Search tips</a>
</td>
</tr>
</tbody></table>
</form>
EDIT
Continued with How to extract the number of results from the results page.
Your assumption is right, you'll have to change other parts of the code, namely that extracts the number of results value.
Let's stick with the previous example.
When searching from "corpus", you will find this line in the source of the result page:
<input type="hidden" name="TOTAL_PAGES" value="2836">
And you'll want to extract 2836. Hence you will search for something like <input type="hidden" name="TOTAL_PAGES" value=" and get the subequent value before the closing quote.
I am not going to tell you how to code in VBA, but it is basic String manipulation so I hope you can handle it.

Related

Selenium Python select and click multiple checkbox elements by input type

Trying to select and click multiple checkboxes on a page. they don't have a class, ID or names that match. All they have in common is their input type (checkbox). I'm able to select individual checkboxes using XPATH but its not practical as there's at least 100 checkboxes on the page.
Here's a section of the HTML. I've been testing this code on an off-line version of the site.
<body>
<tbody>
<tr>
<td class="borderBot"><b>Activity</b></td>
<td class="borderBot">
<table cellpadding="3" cellspacing="3" width="100%">
<tbody>
<tr>
<td width="33%">
<input type="checkbox" name="competency1Activity" value="1" />
Plan (ie Interpreted diag etc)
</td>
<td width="33%">
<input type="checkbox" name="competency1Activity" value="2" />
Carry Out (ie conducted work)
</td>
<td width="33%">
<input type="checkbox" name="competency1Activity" value="4" />
Complete (ie Compliance etc)
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td class="borderBot"><b>Supervision</b></td>
<td class="borderBot">
<table cellpadding="3" cellspacing="3" width="100%">
<tbody>
<tr>
<td width="33%">
<input
type="checkbox"
name="competency1Supervision"
value="1"
/>
Direct
</td>
<td width="33%">
<input
type="checkbox"
name="competency1Supervision"
value="2"
/>
General
</td>
<td width="33%">
<input
type="checkbox"
name="competency1Supervision"
value="4"
/>
Broad
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td class="borderBot"><b>Support</b></td>
<td class="borderBot">
<table cellpadding="3" cellspacing="3" width="100%">
<tbody>
<tr>
<td width="33%">
<input type="checkbox" name="competency1Support" value="1" />
Constant
</td>
<td width="33%">
<input type="checkbox" name="competency1Support" value="2" />
Intermittent
</td>
<td width="33%">
<input type="checkbox" name="competency1Support" value="4" />
Minimal
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td class="borderBot"><b>Materials</b></td>
<td class="borderBot">
<table cellpadding="3" cellspacing="3" width="100%">
<tbody>
<tr>
<td width="50%">
<input type="checkbox" name="competency1Extended" value="1" />
Insulation failure
</td>
<td width="50%">
<input type="checkbox" name="competency1Extended" value="2" />
Incorrect connections
</td>
</tr>
<tr>
<td width="50%">
<input type="checkbox" name="competency1Extended" value="4" />
Circuits-wiring; eg. open short
</td>
<td width="50%">
<input type="checkbox" name="competency1Extended" value="8" />
Unsafe condition
</td>
</tr>
<tr>
<td width="50%">
<input
type="checkbox"
name="competency1Extended"
value="16"
/>
Apparatus/component failure
</td>
<td width="50%">
<input
type="checkbox"
name="competency1Extended"
value="32"
/>
Related mechanical failure
</td>
</tr>
<tr>
<td width="50%">
<input
type="checkbox"
name="competency1Extended"
value="64"
/>
Read/interpret drawings/plans
</td>
<td width="50%">
<input
type="checkbox"
name="competency1Extended"
value="128"
/>
Other elec app and circuit faults
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<input
type="hidden"
id="competency1ExtendedCount"
name="competency1ExtendedCount"
value="8"
/>
</tbody>
</body>
Try 1 - this selects and clicks the first checkbox but none of the others
checkboxes = driver.find_element(By.CSS_SELECTOR, "input[type='checkbox']").click()
Try 2 - Thought this would work but I cant get the syntax right
checkboxes = driver.find_element(By.CSS_SELECTOR, "input[type='checkbox']")
for checkbox in checkboxes:
checkboxes.click()
time.sleep(.3)
Try 3 - able to select first checkbox (of 3) with this name
checkboxes = driver.find_element("name", "competency1Ativity").click()
find_element() will return only the first matching element, where as you need to identify all of them. So you need to use find_elements() method.
Solution
You can identify all the checkbox and store the elements within a list and then click on them one by one using either of the following locator strategies:
Using CSS_SELECTOR:
from selenium.webdriver.common.by import By
checkboxes = driver.find_elements(By.CSS_SELECTOR, "td input[type='checkbox']")
for checkbox in checkboxes:
checkbox.click()
Using XPATH:
from selenium.webdriver.common.by import By
checkboxes = driver.find_elements(By.XPATH, "//td//input[#type='checkbox']")
for checkbox in checkboxes:
checkbox.click()
Your option 2 was nearly correct, however it should have been find_elements() not find_element()
Since find_element() just returns an webelement where as find_elements() returns list of elements.
code:
checkboxes = driver.find_elements(By.CSS_SELECTOR, "input[type='checkbox']")
for checkbox in checkboxes:
checkbox.click()
time.sleep(0.5)

Objects in IE are not getting identified by VBA

Iam trying to automate IE Browser using VBA. There are nearly 25 links in the browser page but when i use getElementsByTagName("a") function VBA is identifying only 2 links form the page. Iam using IE 8 and MS Office 2007 package.
Have any one faced the same issue earlier. Kindly help
<form name="form1" action="/insurance/gs/servlet/API.gs.pol.nb.action.NewBizAction" method=post>
<input type="hidden" name='operId' value="">
<input type="hidden" name="policyId" value = "309865">
<input type="hidden" name='isCustomer' value="1">
<input type="hidden" name='indDeletePartyId' value="">
<input type="hidden" name='orgDeletePartyId' value="">
<input type="hidden" name='deletePartyId' value="">
<input type="hidden" name='roleType' value="">
<input type="hidden" name='submissionCustId' value="0">
<input type="hidden" name='newPartyId' value="null">
<input value="API.gs.pol.nb.action.NewBizAction?operId=LoaderCustInfo&policyId=309865" type="hidden" name="fromUrl" >
<table cellpadding="1" cellspacing="1" class="table_frame">
<tr>
<td class="body">
<div class="main">
<table width="100%" border="0" cellspacing="0" cellpadding="0" class="table_data">
<tr>
<td>
<table border="0" cellspacing="0" cellpadding="0" class="table_heading1">
<tr>
<td class="table_heading1_tdleft table_heading1_tdpic"></td>
<td class="table_heading1_td">
<table border="0" cellspacing="0" cellpadding="0" class="table_heading1_1">
<tr>
<td class="arrow_td"><img src="/insurance/icp/ifoundation/html/foundation/ui31/images/common/arrow1.gif" /></td>
<td width="49%" nowrap="nowrap" class="font_heading1">
Customer List
</td>
<td width="49%" align="right" >
<a class="a2" href="#" onClick="javascript:if(cheBefSelCust()){loadIndiCust()}">Individual</a>
<a class="a2" href="#" onClick="javascript:if(cheBefSelCust()){loadOrganCust()}">Company</a>
</td>
</tr>
</table>
</td>
<td class="table_heading1_tdright table_heading1_tdpic"></td>
</tr>
</table>
In the above HTML Page I want to Select the Link Individual

How to get input field embedded in table using page-object accessor

Given the HTML snippet below how can I, using page-object field accessors, get the second input field, the one with id="333:".
I can't use the id to identify the field as it's auto-generated.
Using page-object accessors I can get the embedded table and the correct cell but the cell object is a PageObject::Elements::TableCell which doesn't seem to have any methods that allow access to the embedded span and input field.
I can get the input field using native Watir accessors - i.e., browser.div( ... etc ...) but would prefer to use just page-object accessors if possible.
If not possible then I'll live with native Watir. Thanks
<div class="modalContent">
<table role="presentation" id="324:_layoutTbl" class="axial">
<tbody>
<tr>
<td>
<input id="326:" name="" value="" onchange="juic.fire("326:","_change",event);" onfocus="juic.fire("326:","_focus",event);" onblur="juic.fire("326:","_blur",event);" type="text">
</td>
</tr>
<tr>
<td class="sfTH">
<label for="332:">Users:</label>
</td>
<td>
<table class="noborder" role="presentation">
<tbody>
<tr>
<th>
<label id="336:" class="rcmFormLabel">Hiring Manager</label>
<label id="337:" for="333:" class="rcmFormLabe">Users, Hiring Manager</label>
</th>
<td><span class="autocompspan ">
<input class="autocompinput" id="333:" name="333:" onfocus="juic.fire("333:","_focus",event);" size="20" value="Bill Murray" role="combobox" type="text">
<input value="111111" id="333:_hidden" name="333:_hidden" type="hidden">
</span>
<div id="335:" style="display:none"><img alt="" class="globalFloatLeft">
<div id="335:_error" style="color:#ff0000">undefined is required</div>
<div style="clear:both;"></div>
</div>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
After more experimenting I found the answer to my question. I declared the table accessor, e.g., div(:my_table, :class => 'modalContent').table.table then in the code, to get the input field in the first row I did my_text_field = my_table_element[0].text_field_element which returned the PageObject::TextField object in column 2 of row 1

Td innerText element VBA

I have this code :
<tbody id="frm:r:0:s:tbody_element">
<tr>
<td>
<img id="frm:r:0:s:0:img_2" src="/example/img/ic_small_min.gif"
onclick="Expand(getID(this.id,'f'))" style="cursor: pointer;" />
- My Items
<span id="frm:r:0:s:0:f" style="DISPLAY: none;">
<input type="checkbox" name="frm:r:0:s:0:bcb" id="frm:r:0:s:0:bcb"
value="true" onclick="checkAll(this.form, this)" />
<table id="frm:r:0:s:0:mcb">
<tr><td><label><input type="checkbox" name="frm:r:0:s:0:mcb"
value="H321" /> List</label></td>
</tr>
<tr><td>
<label><input type="checkbox" name="frm:r:0:s:0:mcb"
value="H318" /> Edit</label></td>
</tr>
<tr><td><label><input type="checkbox" name="frm:r:0:s:0:mcb"
value="H310" /> Delete</label>
</td></tr>
But I try to getElementsbyid("frm:r:0:s:tbody_element").innerText and I get :
My items List Edit Delete
Instead of only the
My items
why?
What I want to do is get only the : My Items
I don't want to get all the text inside the element.
'if doc = the loaded document
Set tb = doc.getElementById("frm:r:0:s:tbody_element")
Set els = tb.getElementsByTagName("td")(0).ChildNodes
Debug.Print els(1).NodeValue ' "- My Items"

How do I get accurate entry paths in Expression Engine search results?

I am working to implement advanced search on a site and need some help getting more accurate paths to entries in the search results page.
I am using a modified advanced search form:
{exp:search:advanced_form result_page="search/advanced_results"}
<fieldset class="fieldset">
<legend>{lang:search_by_keyword}</legend>
<input type="text" class="input" maxlength="100" size="40" name="keywords" style="width:100%;" />
<div class="default">
<select name="search_in">
<option value="titles" selected="selected">{lang:search_in_titles}</option>
<option value="entries" selected="selected">{lang:search_in_entries}</option>
</select>
</div>
<div class="default">
<select name="where">
<option value="exact" selected="selected">{lang:exact_phrase_match}</option>
<option value="any">{lang:search_any_words}</option>
<option value="all" >{lang:search_all_words}</option>
<option value="word" >{lang:search_exact_word}</option>
</select>
</div>
</fieldset>
<div class="defaultBold">{lang:channels}</div>
<select id="channel_id" name='channel_id[]' class='multiselect' size='15' multiple='multiple' onchange='changemenu(this.selectedIndex);'>
{channel_names}
</select>
<div class="defaultBold">{lang:categories}</div>
<select name='cat_id[]' size='18' class='multiselect' multiple='multiple'>
<option value='all' selected="selected">{lang:any_category}</option>
</select>
<div class='searchSubmit'>
<input type='submit' value='Search' class='submit' />
</div>
{/exp:search:advanced_form}
</body>
and the standard search results code:
<table border="0" cellpadding="6" cellspacing="1" width="100%">
<tr>
<th>{lang:title}</th>
<th>{lang:excerpt}</th>
<th>{lang:author}</th>
<th>{lang:date}</th>
<th>{lang:total_comments}</th>
<th>{lang:recent_comments}</th>
</tr>
{exp:search:search_results switch="resultRowOne|resultRowTwo"}
<tr class="{switch}">
<td width="30%" valign="top"><b>{title}</b></td>
<td width="30%" valign="top">{excerpt}</td>
<td width="10%" valign="top">{author}</td>
<td width="10%" valign="top">{entry_date format="%m/%d/%y"}</td>
<td width="10%" valign="top">{comment_total}</td>
<td width="10%" valign="top">{recent_comment_date format="%m/%d/%y"}</td>
</tr>
{/exp:search:search_results}
</table>
The only problem is that the {auto_path} is anything but accurate, does not link to the entry and basically tries to piggyback off the home page. Is there a way secure more accurate paths? I know Google Search can do it.
Thanks!
Admin → Channel Administration → Channels → Edit Preferences → Path Settings. There you enter the base URL for your auto_path or id_auto_path.
So entering /news/entry/ would yield /news/entry/my-new-url-title in your search results.
The {auto_path} variable in the Search Results Tag is automatically be determined by the Search Results URL preference setting for the channel in Channel Management.
You can find this preference in the Control Panel at: Admin > Channel Administration > Channel Preferences:

Resources