Extract year contents from html code and save them as dataframe - python-3.x

Given a section of html source code named li as follows:
[<li>Project construction cycle</li>,
<li>
Start date: 2019...
Completion date: 2021... <a class="login-btn" href="javascript:">Click to view details</a>
</li>,
<li>Preliminary preparation progress</li>,
<li>
The project has been completed by... <a class="login-btn" href="javascript:">Click to view details</a>
</li>,
<li>Progress in design work</li>,
<li>
The project design has... <a class="login-btn" href="javascript:">Click to view details</a>
</li>,
<li>Procurement of equipment</li>,
<li>
The project equipment... <a class="login-btn" href="javascript:">Click to view details</a>
</li>,
<li>Project construction progress</li>,
<li>
The project is in... <a class="login-btn" href="javascript:">Click to view details</a>
</li>]
How could we extract Start date and Completion date and convert them to a dataframe?
PS: I convert it to dataframe because I need to concatenate it with other columns.
The expected result:
Start date Completion date
0 2019 2021
Thanks.
Updates:
li = str(li)
s = re.compile('Start date:[0-9]{4}').findall(li)
df1 = pd.DataFrame([x.split(':')for x in s ]).set_index(0).T
e = re.compile('Completion date:[0-9]{4}').findall(li)
df2 = pd.DataFrame([x.split(':')for x in e ]).set_index(0).T
# df = pd.concat([df1, df2], axis = 1)
New update:
rmktxt2 = soup.find("table", attrs={"id":"mse_new"}).find("ul", attrs={"class":"rmktxt2"})
li = rmktxt2.find_all("li")
li = str(li)
li = " ".join(li.split())
regex = r"(Start date:\d{4}|Completion date:\d{4})"
data = re.findall(regex, li)
df = pd.DataFrame([x.split(':')for x in data]).set_index(0).T
print(df)
Out:
0 Start date Completion date
1 2019 2021
Now how can I set index 0 starting from row of 2019 2021?
Updates:
regex = r"Start date:(\d{4}).*Completion date:(\d{4})"
data = re.findall(regex, li)[0]
out['Start date'] = data[0]
out['Completion date'] = data[1]
df = pd.DataFrame([out])
Out:
Start date Completion date
0 2019 2021

You may try:
(Start date: \d{4}|Completion date: \d{4})
Explanation of the above regex:
(Start date: \d{4}) - Represents first capturing group matching Start date: literally along with digits appearing exactly 4 times.
| - Represents alternation.
Completion date: \d{4}) - Matches Completion date: literally along with digits appearing exactly 4 times.
You can find the demo of the above regex in here.
Code Demo

Related

Asking Dropdown fo temperature input with selenium in python

Dear All in Stackover flow,
I need your help :
I need to handle change input of bodytemperature
Picture of Web change body temperature
I need input this body temperature with random value between 35.8 - 36.5
This is inspect elements :
<input data-val="true" data-val-number="The field BodyTemperature must be a number." data-val-range="The field BodyTemperature must be between 33 and 43." data-val-range-max="43" data-val-range-min="33" data-val-required="The BodyTemperature field is required." id="BodyTemperature" max="43" min="33" name="BodyTemperature" step="0.1" type="text" value="36.1" data-role="numerictextbox" role="spinbutton" aria-valuemin="33" aria-valuemax="43" class="k-input" aria-valuenow="36.1" aria-disabled="false" style="display: none;">
And this is my code try
input_value = [36.10 ,36.20 ,36.30,36.50,35.80,35.90,35.80]
value = random.choice(input_value)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,//input[#id='BodyTemperature'])).send_keys(value)
I hope Stackoverflow can help me
Please Help Me !!!!
For decimal values you have to use double, Please try the below, I have checked it it's giving the random number between 35.8 - 36.5.
double min = 35.8;
double max = 36.5;
double diff = max - min;
double randomValue = min + Math.random( ) * diff;
System.out.println(String.format("%.3g%n", randomValue))
Result: Like 36.1, 36.2, 35.9
Let us know if it didn't work for you.

Amazon Scraper Bot - VBA/Selenium - Data Scrape Issue

I'm in the process of migrating some VBA code from Internet Explorer to Selenium that scrapes data from Amazon .
The code enters a search term, and scrapes items such as ASIN, Selling Price, # of Reviews, etc from the search results. I can get all the items except Ratings.
The Ratings can be found in two sections of each products hierarchy, both contained inside a span element.
<span aria-label="4.3 out of 5 stars">
<span class="a-icon-alt">4.3 out of 5 stars</span>
Please click for Amazon HTML code hierarchy
Sub AMZ_Scraper
SearchURL = "https://www.amazon.ca/s?k=BIKE+LED+LIGHTS+REAR+FRONT&ref=nb_sb_noss_2"
'Start Edge and Navigate to URL
Dim Browser As New WebDriver
Dim Keys As New Selenium.Keys
Browser.Start "edge"
Browser.Get SearchURL
'Find top element
Dim Elements As Selenium.WebElements
Set Elements = Browser.FindElementsByCss(".s-result-item")
For Each Element In Elements
Asin = Element.Attribute("data-asin")
ProductName = Element.FindElementByClass("a-size-base-plus").Text
Reviews = Element.FindElementByClass("a-size-base").Text
Rating1 = Element.FindElementByXPath("//div [#class = 'a-row a-size-small']").FindElementByCss("Span").Attribute("aria-label")
Rating2 = Element.FindElementByXPath("//div [#class = 'a-row a-size-small']").FindElementByXPath("//span [#class = 'a-icon-alt']").Text
Next
End sub
The Rating1 code works but it's extracting the same rating (4.4 out of 5 Stars) for every product.
The Rating2 code does not error out but also not extract any data.
How do I extract the value from the element that holds the rating?

Formatting Excel cell from number to text in rails

I have made an application on which I have provide the feature to import the records from CSV and Excel file. I am using roo gem for it. The record added successfully but the problem is at the time of importing records from excel, it adds .0 to every field which is number. I don't want it because i have some fields like enrollment_no, roll_no, contact_no and it adds .0 to every filed like it made 23 to 23.0. I already had converted these filed to varchar in database and now i want to format the excel cell from number to text. It will solve my problem. Tell me how i will format the excel cell from number to string using rails.
Here is my code for importing the file:
student.rb :
def self.import(file, current_organization_id)
spreadsheet = open_spreadsheet(file)
header = spreadsheet.row(1)
(2..spreadsheet.last_row).each do |i|
row = Hash[[header, spreadsheet.row(i)].transpose]
record = Student.find_by(:organization_id => current_organization_id,:enrollment_no => row["enrollment_no"]) || new
record.organization_id= current_organization_id
record.attributes = row.to_hash.slice(*row.to_hash.keys)
record.save!
end
end
def self.open_spreadsheet(file)
case File.extname(file.original_filename)
when ".csv" then Roo::CSV.new(file.path)
when ".xls" then Roo::Excel.new(file.path)
when ".xlsx" then Roo::Excelx.new(file.path)
else raise "Unknown file type: #{file.original_filename}"
end
end
students_controller.rb :
def import
Student.import(params[:file], session[:current_organization_id])
#puts #session[:current_organization_id].inspect
redirect_to students_path, notice: "Record imported Successfully."
end
new.html.erb :
<%= form_tag import_students_path, multipart: true do %>
<%= file_field_tag :file , :required=> true%> <br/>
<%= submit_tag "Import" , :class => "btn btn-primary btn-block" %>
<% end %>
I am doing something similar in my application but the import is made easier by importing only from csv.
It seems that cell type is a pretty common problem in Roo and there are few workaround suggested using regex or char to include in your cell.
My solution it would be much easier:
# student.rb
COLUMNS_TO_STRING = ["organization_id", "enrollment_no", "contact_no"] # and so on
def self.import(file, current_organization_id)
spreadsheet = open_spreadsheet(file)
header = spreadsheet.row(1)
(2..spreadsheet.last_row).each do |i|
row = Hash[[header, spreadsheet.row(i)].transpose]
row = clean_for row, COLUMNS_TO_STRING
record = Student.find_by(:organization_id => current_organization_id,:enrollment_no => row["enrollment_no"]) || new
record.organization_id= current_organization_id
record.attributes = row.to_hash.slice(*row.to_hash.keys)
record.save!
end
end
def self.clean_for row_as_hash, string_columns_array
row_as_hash.each do |key, value|
if string_columns_array.include?key
row_as_hash[key] = value.to_i.to_s
end
end
end
def self.open_spreadsheet(file)
case File.extname(file.original_filename)
when ".csv" then Roo::CSV.new(file.path)
when ".xls" then Roo::Excel.new(file.path)
when ".xlsx" then Roo::Excelx.new(file.path)
else raise "Unknown file type: #{file.original_filename}"
end
end
get the index of the columns you want to format differently
convert the value imported from float to integer
convert the integer to string

How to insert new tr every third iteration in Jade

I'm new in node.js and Jade.
I searched for solutions without success (maybe I asked wrong questions in google, I don't know).
I want to create table rows in each loop in Jade. The thing is that after every 3rd td I want insert new tr. Normally it's quite simple but with Jade I simply can't achieve that.
My Jade file:
table
thead
tr
td Header
tbody
each item, i in items
if (i % 3 === 0)
tr
td
a(href="#{baseUrl}/admin.html?id=#{item.id}")
I know that something is wrong with my if statement. I tried many configurations without luck. I'm sure that it will be quite easy issue.
Thanks in advance for help!
EDIT
Based on #Laurent Perrin answer I modified a little my code. Now it creates tr, then 3 td and then new tr so it's a little closer...
New Jade
if (i % 3 === 0)
tr
td: a(href="#{baseUrl}/admin.html?id=#{item.id}") dsdsd #{i}
Generated HTML
<tr></tr>
<td>0</td>
<td>1</td>
<td>2</td>
<tr></tr>
EDIT: this code should do what you want, but it's not very elegant:
table
thead
tr: td Header
tbody
- for(var i = 0, nbRows = items.length/3; i < nbRows; i++) {
tr
if items[3*i]
td: a(href="#{baseUrl}/admin.html?id=#{items[3*i].id}")
if items[3*i + 1]
td: a(href="#{baseUrl}/admin.html?id=#{items[3*i + 1].id}")
if items[3*i + 2]
td: a(href="#{baseUrl}/admin.html?id=#{items[3*i + 2].id}")
- }
What you could do instead is tweak your model to make it more Jade-friendly, by grouping items by rows:
function getRows(items) {
return items.reduce(function (prev, item, i) {
if(i % 3 === 0)
prev.push([item]);
else
prev[prev.length - 1].push(item);
return prev;
}, []);
}
This will turn:
[{id:1},{id:2},{id:3},{id:4},{id:5}]
into:
[
[{id:1},{id:2},{id:3}],
[{id:4},{id:5}]
]
Then your jade code becomes much simpler:
table
thead
tr: td Header
tbody
each row in rows
tr
each item in row
td: a(href="#{baseUrl}/admin.html?id=#{item.id}")
An example of jade + bootstrap, each 4 elements(columns) one row, and the rows goes inside the row.
```
- var i = 0
- var itens_per_line = 4
each order in viewBag.orders
- if (i % itens_per_line === 0 || i === 0) {
.row
- }
.col-md-3.column
p #{order.number}
- i++
```
Here is what I did for a single array (e.g. ['1','2','3','4']) to convert it into two values per row, it could be adjusted for 3.
(mixins are templates in Jade/Pug)
mixin mInput
div.form-group.col-md-6
p=oval
- var valcounter = 0
- var row = [];
each val in JSON.parse(formvalues)
if(valcounter % 2 === 0)
- var col = [];
- col.push(val)
else
- col.push(val)
- row.push(col)
- valcounter++
each orow in row
div.row
each oval in orow
+mInput

Drop down list box with dynamic month and year in Date Prompt in Cognos

I want to add the current month, and the previous two months to a prompt, for a user to select.
e.g. if this month is 2008 Nov, ddlbox should show the following:
112008
102008
092008
How can I do this?
<asp:DropDownList ID="DropDownList1" runat="server">
</asp:DropDownList>
for (int i = 0; i < 3; i++)
{
ListItem item = new ListItem(string.Format("{0: MM/yyyy}", DateTime.Now.AddMonths(-i)));
DropDownList1.Items.Add(item);
}
Try this :)
You could also create a query subject with SQL like this Oracle example:
SELECT to_char(add_months(SYSDATE, -1 * LEVEL + 1), 'MMYYYY') AS mon
FROM dual
CONNECT BY rownum < 4

Resources