I am trying to pull the time posted, text, and number of likes from this linkedin company page: https://www.linkedin.com/company/excella-consulting/
I have the data points all pulling to the console correctly but I get a list index out of range error still. I'm confused.
The len of my variable likes_container is 0 but I still pull the correct number of likes per post in the console, I'm new to Python and I'm not sure how to resolve this.
I'm using selenium to pull the html and have my credentials in a separate parameters file so I adjusted the code below for the linkedin credentials/URL.
Thanks
from selenium.webdriver.common.keys import Keys
import time
import parameters
import csv
from parsel import Selector
import string
import re
import subprocess
# specifies the path to the chromedriver.exe
driver = webdriver.Chrome()
# driver.get method() will navigate to a page given by the URL address
driver.get('https://www.linkedin.com/uas/login?trk=guest_homepage-basic_nav-header-signin')
# locate email form by_class_name
username = driver.find_element_by_id('username')
# send_keys() to simulate key strokes
username.send_keys('email#email.com')
# locate password form by_class_name
password = driver.find_element_by_id('password')
# send_keys() to simulate key strokes
password.send_keys('#######')
# locate submit button by_class_name
#log_in_button = driver.find_element_by_class_name('btn__primary--large from__button--floating')
# locate submit button by_xpath
log_in_button = driver.find_element_by_xpath('//*[#type="submit"]')
# .click() to mimic button click
log_in_button.click()
# search_query = driver.find_element_by_xpath("//input[#type='text']")
# # send_keys() to simulate the search text key strokes
# search_query.send_keys('Excella Consulting')
# # .send_keys() to simulate the return key
# search_query.send_keys(Keys.ENTER)
#find content only and click
driver.get('https://www.linkedin.com/company/excella-consulting/')
SCROLL_PAUSE_TIME = 2.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
company_page = driver.page_source
from bs4 import BeautifulSoup as bs
linkedin_soup = bs(company_page.encode("utf-8"), "html")
linkedin_soup.prettify()
containers = linkedin_soup.findAll("div",{"class":"occludable-update ember-view"})
container = containers[0].find("div","feed-shared-update-v2 relative full-height feed-shared-update-v2--e2e feed-shared-update--chat-ui feed-shared-update-v2--minimal-padding Elevation-2dp ember-view")
timestr = time.strftime("%Y%m%d-%H%M%S")
filename = "Excella_LinkedIn_Posts_" + timestr + ".csv"
f = open(filename, "w")
headers = "Posted_Date, Text, Likes\n"
f.write(headers)
for container in containers:
posted_container = container.findAll("div",{"class": "feed-shared-text-view white-space-pre-wrap break-words ember-view"})
posted_date = posted_container[0].find("span",{"class": "ember-view"})
box_container = container.findAll("div",{"class": "feed-shared-update-v2__description-wrapper"})
new_box = box_container[0].find("span",{"aria-hidden": "false"})
likes_container = container.findAll("ul", class_="social-details-social-counts--justified social-details-social-counts ember-view")
new_likes = likes_container[0].find("span", {"class": "v-align-middle social-details-social-counts__reactions-count"})
print("Posted_Date: " + posted_date.text.strip())
print("Text: " + new_box.text.strip())
try:
print("Likes: " + new_likes.text.strip())
except:
pass
f.write(posted_date.text.strip() + "," + new_box.text.strip().replace(",","|") + "," + new_likes.text.strip() + "," + "\n")
f.close()
I was expecting to be able to write each post to a row in a csv, here's the print results:
Posted_Date: 6d
Text: Congratulations to Excellian Trent Hone on his award from the U.S. Naval Institute for his book 'Learning War.' Named the 2018 Author of the Year, we are proud to recognize his hard work and dedication.
Likes: 41
Posted_Date: 6d
Text: Want to hear more from our team of hashtag#Agile experts and trainers? Sign up for our monthly Training Newsletter to get industry insight, tips and tricks, and a list of our featured upcoming classes! Sign up today to receive our May newsletter tomorrow: hashtag#AgileTraining hashtag#ExcellaTraining
Likes: 3
Posted_Date: 23h
Text: Today we bring you hashtag#Agile2019 speaker #Dane Weber, a Lead Consultant at Excella! Dane will be presenting 'Undercover Scrum Master' on August 5th at 3:45pm. More details to come: hashtag#scrummaster hashtag#speaker
Likes: 2
but I got this error underneath the print results:
IndexError Traceback (most recent call last)
<ipython-input-26-b452542829fd> in <module>
117 new_box = box_container[0].find("span",{"aria-hidden": "false"})
118 likes_container = container.findAll("ul", class_="social-details-social-counts--justified social-details-social-counts ember-view")
--> 119 new_likes = likes_container[0].find("span", {"class": "v-align-middle social-details-social-counts__reactions-count"})
120 # try:
121 # new_likes
IndexError: list index out of range
EDIT: Here's the HTML for a container, can provide more if needed:
<div class="occludable-update ember-view" id="ember2840"> <div class="feed-shared-update-v2 relative full-height feed-shared-update--chat-ui feed-shared-update-v2--minimal-padding Elevation-2dp ember-view" data-id="urn:li:activity:6490973066489843713" id="ember2993"> <div class="display-flex feed-shared-actor display-flex feed-shared-actor--with-control-menu ember-view" id="ember2994"><a class="feed-shared-actor__container-link display-flex flex-grow-1 app-aware-link ember-view" data-control-id="g5vpPsibn6Sa8fKqcxeTxg==" data-control-name="actor_container" href="https://www.linkedin.com/company/163739/?miniCompanyUrn=urn%3Ali%3Afs_miniCompany%3A163739" id="ember2995" target="_self"> <div class="feed-shared-actor__image" data-control-id="g5vpPsibn6Sa8fKqcxeTxg==" data-control-name="actor_picture" data-ember-action="" data-ember-action-2996="2996">
<span class="js-feed-shared-actor__avatar" data-entity-hovercard-id="urn:li:fs_miniCompany:163739">
<div class="feed-shared-actor__avatar ivm-image-view-model ember-view" id="ember2997"> <div class="display-flex ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag ember-view" id="ember2998"><!-- --> <img alt="Excella" class="lazy-image ivm-view-attr__img--centered feed-shared-actor__avatar-image EntityPhoto-square-3"/>
</div>
</div>
</span>
</div>
<div class="feed-shared-actor__meta" data-control-id="g5vpPsibn6Sa8fKqcxeTxg==" data-control-name="actor" data-ember-action="" data-ember-action-2999="2999">
<h3 class="feed-shared-actor__title t-12 t-black--light t-normal">
<span class="feed-shared-actor__name t-14 t-black t-bold hoverable-link-text" data-entity-hovercard-id="urn:li:fs_miniCompany:163739">
<span dir="ltr">Excella</span>
</span>
<!-- --> </h3>
<span class="feed-shared-actor__description t-12 t-black--light t-normal">
<div class="truncate feed-shared-text-view white-space-pre-wrap break-words ember-view" id="ember3000"><span aria-hidden="false">4,338 followers</span><!-- --></div>
</span>
<span class="feed-shared-actor__sub-description t-12 t-black--light t-normal">
<div class="feed-shared-text-view white-space-pre-wrap break-words ember-view" id="ember3003"><span aria-hidden="false"><span class="ember-view" id="ember3006"><span>3mo</span></span></span><!-- --></div>
</span>
</div>
</a>
<button aria-expanded="false" aria-label="See more about Excella" class="entity-hovercard__a11y-trigger" data-entity-hovercard-id="urn:li:fs_miniCompany:163739" data-entity-hovercard-trigger="click"></button>
<!-- --></div>
<div class="feed-shared-update-v2__control-menu absolute text-align-right feed-shared-control-menu ember-view" id="ember3008"><artdeco-dropdown class="ember-view" id="ember3009"><artdeco-dropdown-trigger aria-expanded="false" class="feed-shared-control-menu__trigger artdeco-button artdeco-button--tertiary artdeco-button--muted artdeco-button--1 artdeco-button--circle ember-view" id="ember3010" placement="bottom" role="button" tabindex="0"> <li-icon aria-label="Open control menu" class="artdeco-button__icon" role="img" type="ellipsis-horizontal-icon"><svg class="artdeco-icon" focusable="false" height="24px" preserveaspectratio="xMinYMin meet" viewbox="0 0 24 24" width="24px" x="0" y="0"><path class="large-icon" d="M2,10H6v4H2V10Zm8,4h4V10H10v4Zm8-4v4h4V10H18Z" style="fill: currentColor"></path></svg></li-icon>
<!-- --></artdeco-dropdown-trigger><artdeco-dropdown-content aria-hidden="true" arrow-dir="right" class="feed-shared-control-menu__content artdeco-dropdown-with-arrow ember-view" data-dropdown="" id="ember3011" justification="right" placement="bottom" tabindex="-1"> <ul>
<li class="option-share-via">
<artdeco-dropdown-item class="tap-target display-flex align-items-center ember-view" data-dropdown="" id="ember3013" role="button" tabindex="0"> <li-icon aria-hidden="true" class="flex-shrink-zero mr2" type="link-icon"><svg class="artdeco-icon" focusable="false" height="24px" preserveaspectratio="xMinYMin meet" viewbox="0 0 24 24" width="24px" x="0" y="0"><path class="large-icon" d="M17.29,3a3.7,3.7,0,0,0-2.62,1.09L12.09,6.67A3.7,3.7,0,0,0,11,9.29a3.65,3.65,0,0,0,.52,1.86l-0.37.37a3.66,3.66,0,0,0-4.48.56L4.09,14.67a3.71,3.71,0,1,0,5.24,5.24l2.59-2.59A3.7,3.7,0,0,0,13,14.71a3.65,3.65,0,0,0-.52-1.86l0.37-.37a3.66,3.66,0,0,0,4.48-.57l2.59-2.59A3.71,3.71,0,0,0,17.29,3ZM11.13,14.71a1.82,1.82,0,0,1-.54,1.3L8,18.59A1.83,1.83,0,0,1,5.41,16L8,13.41a1.79,1.79,0,0,1,1.74-.48L8.28,14.4A0.94,0.94,0,0,0,9.6,15.73l1.46-1.46A1.82,1.82,0,0,1,11.13,14.71ZM18.59,8L16,10.59a1.79,1.79,0,0,1-1.74.48L15.73,9.6A0.94,0.94,0,0,0,14.4,8.27L12.94,9.74A1.79,1.79,0,0,1,13.41,8L16,5.41A1.83,1.83,0,0,1,18.59,8Z" style="fill: currentColor"></path></svg></li-icon>
<div class="feed-text-description flex-grow-1 text-align-left">
<span class="feed-shared-control-menu__headline t-14 t-black t-bold">Copy link to post</span>
<span class="feed-shared-control-menu__sub-headline t-12 t-black t-black--light"></span>
</div>
</artdeco-dropdown-item> </li>
<li class="option-unfollow-company">
<artdeco-dropdown-item class="tap-target display-flex align-items-center ember-view" data-dropdown="" id="ember3015" role="button" tabindex="0"> <li-icon aria-hidden="true" class="flex-shrink-zero mr2" type="block-icon"><svg class="artdeco-icon" focusable="false" height="24px" preserveaspectratio="xMinYMin meet" viewbox="0 0 24 24" width="24px" x="0" y="0"><path class="large-icon" d="M12,2C6.5,2,2,6.5,2,12c0,5.5,4.5,10,10,10c5.5,0,10-4.5,10-10C22,6.5,17.5,2,12,2zM3.9,12c0-4.5,3.6-8.1,8.1-8.1c1.9,0,3.7,0.7,5,1.8L5.6,17C4.5,15.7,3.9,13.9,3.9,12zM12,20.1c-1.9,0-3.7-0.7-5-1.8L18.4,7c1.1,1.4,1.8,3.1,1.8,5C20.1,16.5,16.5,20.1,12,20.1z" style="fill: currentColor"></path></svg></li-icon>
<div class="feed-text-description flex-grow-1 text-align-left">
<span class="feed-shared-control-menu__headline t-14 t-black t-bold">Unfollow Excella</span>
<span class="feed-shared-control-menu__sub-headline t-12 t-black t-black--light">Stop seeing posts from Excella</span>
</div>
</artdeco-dropdown-item> </li>
<li class="option-report">
<artdeco-dropdown-item class="tap-target display-flex align-items-center ember-view" data-dropdown="" id="ember3017" role="button" tabindex="0"> <li-icon aria-hidden="true" class="flex-shrink-zero mr2" type="flag-icon"><svg class="artdeco-icon" focusable="false" height="24px" preserveaspectratio="xMinYMin meet" viewbox="0 0 24 24" width="24px" x="0" y="0"><path class="large-icon" d="M13.82,5L14,4a1,1,0,0,0-1-1H5V2H3V22H5V15H9.18L9,16a1,1,0,0,0,1,1h8.87L21,5H13.82ZM5,13V5h6.94l-1.41,8H5Zm12.35,2h-6.3l1.42-8h6.29Z" style="fill: currentColor"></path></svg></li-icon>
<div class="feed-text-description flex-grow-1 text-align-left">
<span class="feed-shared-control-menu__headline t-14 t-black t-bold">Report this post</span>
<span class="feed-shared-control-menu__sub-headline t-12 t-black t-black--light">This post is offensive or the account is hacked</span>
</div>
</artdeco-dropdown-item> </li>
</ul>
</artdeco-dropdown-content></artdeco-dropdown>
<!-- -->
<!-- --></div>
<!-- -->
<div class="feed-shared-update-v2__description-wrapper"><div class="feed-shared-update-v2__description feed-shared-inline-show-more-text ember-view" id="ember3018"><div class="feed-shared-update-v2__commentary feed-shared-text ember-view" dir="ltr" id="ember3019"> <div class="feed-shared-text__text-view feed-shared-text-view white-space-pre-wrap break-words ember-view" id="ember3020"><span aria-hidden="false"><span class="ember-view" id="ember3023"><span>We can't get enough of </span></span><a class="tap-target feed-shared-text-view__mention ember-view" href="/company/10793171/" id="ember3027" target="_self"><span data-entity-hovercard-id="urn:li:fs_miniCompany:10793171" data-entity-type="MINI_COMPANY">HUNGRY Marketplace</span></a><button aria-expanded="false" aria-label="See more about HUNGRY Marketplace" class="entity-hovercard__a11y-trigger" data-entity-hovercard-id="urn:li:fs_miniCompany:10793171" data-entity-hovercard-trigger="click"></button><span class="ember-view" id="ember3031"><span>!</span></span></span><!-- --></div>
</div><!-- --></div></div>
<!-- -->
<!-- --> <div class="Elevation-0dp feed-shared-update-v2__update-content-wrapper feed-shared-mini-update-v2 feed-shared-mini-update-v2--with-context feed-shared-mini-update-v2--minimal-padding ember-view" id="ember3033"><div>
<div class="feed-shared-actor display-flex ember-view" id="ember3034"><a class="feed-shared-actor__container-link display-flex flex-grow-1 app-aware-link ember-view" data-control-id="g5vpPsibn6Sa8fKqcxeTxg==" data-control-name="original_share_actor_container" href="https://www.linkedin.com/company/10793171/?miniCompanyUrn=urn%3Ali%3Afs_miniCompany%3A10793171" id="ember3035" target="_self"> <div class="feed-shared-actor__image" data-control-id="g5vpPsibn6Sa8fKqcxeTxg==" data-control-name="original_share_actor_picture" data-ember-action="" data-ember-action-3036="3036">
<span class="js-feed-shared-actor__avatar" data-entity-hovercard-id="urn:li:fs_miniCompany:10793171">
<div class="feed-shared-actor__avatar ivm-image-view-model ember-view" id="ember3037"> <div class="display-flex ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag ember-view" id="ember3038"><!-- --> <img alt="HUNGRY" class="lazy-image ivm-view-attr__img--centered feed-shared-actor__avatar-image EntityPhoto-square-2"/>
</div>
</div>
</span>
</div>
<div class="feed-shared-actor__meta" data-control-id="g5vpPsibn6Sa8fKqcxeTxg==" data-control-name="original_share_actor" data-ember-action="" data-ember-action-3039="3039">
<h3 class="feed-shared-actor__title t-12 t-black--light t-normal">
<span class="feed-shared-actor__name t-14 t-black t-bold hoverable-link-text" data-entity-hovercard-id="urn:li:fs_miniCompany:10793171">
<span dir="ltr">HUNGRY</span>
</span>
<!-- --> </h3>
<span class="feed-shared-actor__description t-12 t-black--light t-normal">
<div class="truncate feed-shared-text-view white-space-pre-wrap break-words ember-view" id="ember3040"><span aria-hidden="false">637 followers</span><!-- --></div>
</span>
<span class="feed-shared-actor__sub-description t-12 t-black--light t-normal">
<div class="feed-shared-text-view white-space-pre-wrap break-words ember-view" id="ember3043"><span aria-hidden="false"><span class="ember-view" id="ember3046"><span>4mo</span></span></span><!-- --></div>
</span>
</div>
</a>
<button aria-expanded="false" aria-label="See more about HUNGRY" class="entity-hovercard__a11y-trigger" data-entity-hovercard-id="urn:li:fs_miniCompany:10793171" data-entity-hovercard-trigger="click"></button>
<button aria-label="Follow" aria-pressed="false" class="feed-shared-actor__follow-button feed-shared-update-v2__follow-button artdeco-button artdeco-button--tertiary follow ember-view" data-control-name="actor_follow_toggle" id="ember3048"> <li-icon aria-hidden="true" class="artdeco-button__icon" size="small" type="plus-icon"><svg class="artdeco-icon" focusable="false" height="24px" preserveaspectratio="xMinYMin meet" viewbox="0 0 24 24" width="24px" x="0" y="0"><path class="small-icon" d="M14,9H9v5H7V9H2V7H7V2H9V7h5V9Z" style="fill-opacity: 1"></path></svg></li-icon>
<span aria-hidden="true">Follow</span>
</button>
</div>
<a class="tap-target feed-shared-mini-update-v2__link-to-details-page ember-view" href="/feed/update/activity:6490609780434956289/" id="ember3049"><div class="feed-shared-inline-show-more-text--minimal-padding feed-shared-inline-show-more-text ember-view" id="ember3050"> <div class="feed-shared-update-v2__commentary feed-shared-text ember-view" dir="ltr" id="ember3051"> <div class="feed-shared-text__text-view feed-shared-text-view white-space-pre-wrap break-words ember-view" id="ember3052"><span aria-hidden="false"><span class="ember-view" id="ember3055"><span>⭐⭐⭐⭐⭐
HUNGRY already? --> </span></span><a class="feed-shared-text-view__hyperlink ember-view" href="http://tryhungry.com" id="ember3059" rel="noopener noreferrer" target="_blank">tryhungry.com</a></span><!-- --></div>
</div>
<!-- --></div></a>
<!-- -->
<div class="feed-shared-mini-update-v2__reshared-content feed-shared-image feed-shared-image--single-image ember-view" id="ember3061"><div class="relative">
<div class="feed-shared-image__container" style="padding-top: 52.36%;">
<a aria-describedby="feed-shared-image-ember3061" class="feed-shared-image__image-link app-aware-link ember-view" data-control-id="g5vpPsibn6Sa8fKqcxeTxg==" data-control-name="update_image" href="#" id="ember3062"> <div class="ivm-image-view-model ember-view" id="ember3063"> <div class="display-flex ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--expanded ivm-view-attr__img-wrapper--use-img-tag ember-view" id="ember3064"><!-- --> <img alt="No alternative text description for this image" class="lazy-image ivm-view-attr__img--centered feed-shared-image__image feed-shared-image__image--constrained"/>
</div>
</div>
</a> </div>
<span class="visually-hidden" id="feed-shared-image-ember3061">
Activate link to view larger image.
</span>
<!-- --></div>
<div class="ember-view" id="ember3065"><!-- --></div></div>
<!-- --></div>
</div>
<!-- -->
<!-- --> <div class="ember-view" id="ember3066"><div class="ember-view" id="ember3067"><!-- --></div></div>
<div class="feed-shared-social-actions feed-shared-social-action-bar social-detail-base-social-actions feed-shared-social-action-bar--minimal-padding ember-view" id="ember3068"> <span class="reactions-react-button ember-view" id="ember3069"><!-- -->
<button aria-label="React to post" aria-pressed="false" class="react-button__trigger artdeco-button artdeco-button--muted artdeco-button--4 artdeco-button--tertiary ember-view" id="ember3070"><!-- -->
<span class="artdeco-button__text">
<li-icon aria-hidden="true" class="artdeco-button__icon" type="like-icon"><svg class="artdeco-icon" focusable="false" height="24px" preserveaspectratio="xMinYMin meet" viewbox="0 0 24 24" width="24px" x="0" y="0"><path class="large-icon" d="M17.51,11L15.36,8a14.81,14.81,0,0,1-2.25-5.29L12.74,1H10.5A2.5,2.5,0,0,0,8,3.5V4.08a9,9,0,0,0,.32,2.39L9,9H4.66A2.61,2.61,0,0,0,2,11.4a2.48,2.48,0,0,0,.39,1.43,2.48,2.48,0,0,0,.69,3.39,2.46,2.46,0,0,0,1.45,2.92,2.47,2.47,0,0,0,0,.36A2.5,2.5,0,0,0,7,22h4.52a8,8,0,0,0,1.94-.24l3-.76H21V11H17.51ZM19,19H16.88l-3.41.82A6,6,0,0,1,12,20H7a0.9,0.9,0,0,1-.9-0.89s0-.07,0-0.14l0.15-1-1-.4a0.9,0.9,0,0,1-.55-0.83,0.93,0.93,0,0,1,0-.22L5,15.57,4.27,15a0.9,0.9,0,0,1-.39-0.74A0.88,0.88,0,0,1,4,13.82l0.46-.72L4,12.38a0.88,0.88,0,0,1-.14-0.51,1,1,0,0,1,1-.87H11.5L10.2,6.3a9,9,0,0,1-.33-2.37V3.38a0.5,0.5,0,0,1,.5-0.5h0.95a17.82,17.82,0,0,0,2.52,6.22L16.6,13H19v6Z" style="fill: currentColor"></path></svg></li-icon>
<span class="artdeco-button__text react-button__text react-button__text--">
Like
</span>
</span></button>
<!-- --></span>
<span class="comment ember-view" id="ember3071"><button aria-label="Comment on {:actorName} post" class="artdeco-button artdeco-button--muted artdeco-button--4 artdeco-button--tertiary ember-view" data-control-name="comment" id="ember3072"> <li-icon aria-hidden="true" class="artdeco-button__icon" type="speech-bubble-icon"><svg class="artdeco-icon" focusable="false" height="24px" preserveaspectratio="xMinYMin meet" viewbox="0 0 24 24" width="24px" x="0" y="0"><path class="large-icon" d="M18,10H6V9H18v1Zm4-5V22l-5-4H3a1,1,0,0,1-1-1V5A1,1,0,0,1,3,4H21A1,1,0,0,1,22,5ZM20,6H4V16H17.7L20,17.84V6Zm-4,6H8v1h8V12Z" style="fill: currentColor"></path></svg></li-icon>
<span class="artdeco-button__text">
Comment
</span></button>
</span>
<span class="feed-shared-social-action-bar__action-btn feed-shared-social-action-bar__reshare-button reshare-button button reshare social-action-btn ember-view" data-control-name="reshare" id="ember3073"><button aria-label="Share {:actorFullName} post" class="artdeco-button artdeco-button--muted artdeco-button--4 artdeco-button--tertiary ember-view" data-control-name="reshare" id="ember3074"> <li-icon aria-hidden="true" class="artdeco-button__icon" type="share-linkedin-icon"><svg class="artdeco-icon" focusable="false" height="24px" preserveaspectratio="xMinYMin meet" viewbox="0 0 24 24" width="24px" x="0" y="0"><path class="large-icon" d="M24,12h0a1.18,1.18,0,0,0-.36-0.84L14,2V8H11A10,10,0,0,0,1,18v4H2.87A6.11,6.11,0,0,1,9,16h5v6l9.63-9.14A1.18,1.18,0,0,0,24,12s0,0,0,0h0Zm-8,5.54V14H9a8.15,8.15,0,0,0-6,2.84A8,8,0,0,1,11,10h5V6.48L21.81,12Z" style="fill: currentColor"></path></svg></li-icon>
<span class="artdeco-button__text">
Share
</span></button>
<!-- --></span>
<!-- -->
</div><!-- --><!-- --> <div class="feed-shared-update-v2__comments-container display-flex flex-column">
<!-- --><!-- --><!-- --> <div class="feed-shared-first-prompt-block ember-view" id="ember3075"> <p class="t-12 t-black t-normal">Be the first to react</p>
</div>
</div>
<!-- -->
</div>
</div>
I have a page source with div tags like the example page source below. I would like to scrape all the urls like the example below and save them in a list.
Example url:
/model-airplane-kits-s/2379.htm
from:
<a data-control-id="aP52Q/QyTbqArQOpbKv4EQ==" data-control-name="A_jobssearch_job_result_click" href="/model-airplane-kits-s/2379.htm" id="ember1513" class="job-card-search__link-wrapper js-focusable-card disabled ember-view">
I’ve tried using the code below to scrape the urls from the href. I’m trying to use the span class to filter for only div tags that contain the job-card-search__easy-airplane. The code doesn’t return any urls, just an empty list. I’m new to beautifulsoup and selenium. If anyone could please point out what my issue is and suggest a solution I would be greatful. Especially if you could also give some explanation, like how I need to search the tree structure of the html.
Code:
soup = BeautifulSoup(driver.page_source)
tags = soup.find_all('a', {'class': 'job-card-search__easy-airplane', 'href': True})
urls = [t['href'] for t in tags]
page source:
<div data-control-name="A_jobssearch_job_result_click" data-job-id="urn:li:fs_normalized_jobPosting:1175863492" tabindex="0" role="button" id="ember1507" class="job-card-search--two-pane jobs-search-results__list--card--viewport-tracking-1 job-card-search job-card-search--column job-card-search job-card-search--clickable job-card-search--outline-default ember-view"><artdeco-entity-lockup size="4" id="ember1508" class="artdeco-entity-lockup--size-4 artdeco-entity-lockup ember-view"><figure id="ember1509" class="artdeco-entity-lockup__image artdeco-entity-lockup__image--type-square ember-view" type="square"><a data-control-id="aP52Q/QyTbqArQOpbKv4EQ==" data-control-name="A_jobssearch_job_result_click" tabindex="-1" href="/jobs/view/1175863492/?eBP=CwEAAAFqIxiBlPhCqFcaiXqaLT8ZCYXTIftwHuk7g59oqTz7fLS2Usfj45gbPf53raGy8aX-F7FvqLIf3MJgOTHo3Ugkxh6sCVhZlkZRMQH3gDk8lSE_wujH7Mz9tU8Upy0ZIWHS9wbUErl6g8Z8C2-z1YCW85y0eMG57HHPJnWYYbtoCS9Wh_NGgMmlglzGytFLwYgXEu56gDUcWhRkT_AHODGr3-ZLjO6FcpctLngpJnHm4r2dmo9F8AUfP3HYWjOK-pToyQlStkfh0IcKMce2jIuCxe3Wgc90v7HF7kEItq-WdL1IdbnHbvN9gPBrSubLfU_pPqmwGRoTmMlPygTbXERDrw4&recommendedFlavor=SCHOOL_RECRUIT&refId=fd370713-e20e-4b02-9676-9009d8e52d34&trk=d_flagship3_search_srp_jobs" id="ember1510" class="job-card-search__link-wrapper js-focusable-card disabled ember-view"> <img class="lazy-image loaded job-card-search__logo-image" title="Ancestry" alt="Ancestry logo" height="64" width="64" src="https://media.licdn.com/dms/image/C560BAQFzwmebdgodyw/company-logo_100_100/0?e=1563408000&v=beta&t=Xr94FzOXIsd2wULd8cHG7Lr8nppKm0wWGCph-_N4YMk">
</a>
</figure>
<artdeco-entity-lockup-content id="ember1511" class="job-card-search__content-wrapper artdeco-entity-lockup__content ember-view"><h3 id="ember1512" class="job-card-search__title artdeco-entity-lockup__title ember-view"><a data-control-id="aP52Q/QyTbqArQOpbKv4EQ==" data-control-name="A_jobssearch_job_result_click" href="/model-airplane-kits-s/2379.htm" id="ember1513" class="job-card-search__link-wrapper js-focusable-card disabled ember-view"> Data Scientist - Search
<span class="job-card-search__promoted-tag-separator"> </span><span class="job-card-search__promoted-tag">Promoted</span>
</a>
</h3>
<h4 id="ember1514" class="job-card-search__company-name t-14 t-black artdeco-entity-lockup__subtitle ember-view"><a data-control-id="aP52Q/QyTbqArQOpbKv4EQ==" data-control-name="job_card_company_link" href="/company/397181/" id="ember1515" class="job-card-search__company-name-link ember-view"> Ancestry
</a></h4>
<h5 id="ember1516" class="job-card-search__location artdeco-entity-lockup__caption ember-view"><!---->
<li-icon aria-hidden="true" type="map-marker-icon" class="job-card-search__exact-location-icon" size="small"><svg viewBox="0 0 24 24" width="24px" height="24px" x="0" y="0" preserveAspectRatio="xMinYMin meet" class="artdeco-icon" focusable="false"><path d="M8,4a2,2,0,1,0,2,2A2,2,0,0,0,8,4ZM8,7.13A1.13,1.13,0,1,1,9.13,6,1.13,1.13,0,0,1,8,7.13ZM8,1A5,5,0,0,0,3,6a5.37,5.37,0,0,0,.41,2S5.91,13,7.22,15.52A0.86,0.86,0,0,0,8,16H8a0.86,0.86,0,0,0,.78-0.48C10.09,13,12.59,8,12.59,8A5.37,5.37,0,0,0,13,6,5,5,0,0,0,8,1Zm2.88,6.24L8,12.92,5.12,7.24A3.49,3.49,0,0,1,4.88,6a3.13,3.13,0,0,1,6.25,0A3.49,3.49,0,0,1,10.88,7.24Z" class="small-icon" style="fill-opacity: 1"></path></svg></li-icon>
San Francisco, CA, US
</h5>
<!----></artdeco-entity-lockup-content>
</artdeco-entity-lockup>
<!---->
<div class="job-card-search__body">
<p class="job-card-search__description-snippet">
Combining the rich information in family trees and historical records with the genetic details revea...
<span class="job-card-search__source-domain">jobs.smartrecruiters.com</span>
</p>
<div class="job-card-search__job-flavors-container job-flavors">
<div id="ember1517" class="job-flavors__flavor job-flavors__flavor--school-recruit ember-view"><a data-control-name="jobdetails_sharedconnections" href="/search/results/people/?facetCurrentCompany=397181&facetSchool=17816&origin=JOB_PAGE_CANNED_SEARCH" id="ember1518" class="search-s-shared-connections__link job-flavors__link link-without-visited-state ember-view"> <div class="job-flavors__logo-container">
<img class="lazy-image loaded job-flavors__logo-image" title="California Polytechnic State University-San Luis Obispo" alt="California Polytechnic State University-San Luis Obispo" src="https://media.licdn.com/dms/image/C560BAQERJB5dSuJ9Ow/company-logo_100_100/0?e=1563408000&v=beta&t=qIVll2vKhp3fUGa1FYyqjduYZkuuo-ApJ-Jiur-j1sY">
</div>
<div class="job-flavors__label">
5 alumni work here
</div>
</a></div>
</div>
<!---->
<ul class="job-card-search__footer mt1 t-12 t-black--light list-style-none">
<li class="job-card-search__footer-item">
<time class="job-card-search__time-badge job-card-search__time-badge--new" datetime="2019-04-15">
6 hours ago
</time>
</li>
<li class="job-card-search__footer-item">
<span class="job-card-search__easy-airplane">
<li-icon aria-hidden="true" type="linkedin-inbug-color-icon" class="mr1" size="small"><svg viewBox="0 0 24 24" width="24px" height="24px" x="0" y="0" preserveAspectRatio="xMinYMin meet" class="artdeco-icon" focusable="false"><g class="small-icon" style="fill-opacity: 1">
<path d="M13.75,1H2.25A1.25,1.25,0,0,0,1,2.25v11.5A1.25,1.25,0,0,0,2.25,15h11.5A1.25,1.25,0,0,0,15,13.75V2.25A1.25,1.25,0,0,0,13.75,1Z" style="fill: #0073b1"></path>
<path d="M4,2.68A1.36,1.36,0,0,0,2.69,4,1.36,1.36,0,0,0,4,5.31,1.36,1.36,0,0,0,5.31,4,1.36,1.36,0,0,0,4,2.68Z" style="fill: #fff"></path>
<rect x="3" y="6" width="2" height="7" style="fill: #fff"></rect>
<path d="M10.25,5.88a3,3,0,0,0-2.31,1H7.88V6H6v7H8V10c0-1.17.48-2,1.62-2,.91,0,1.38.66,1.38,2v3h2V8.88C13,7,12.21,5.88,10.25,5.88Z" style="fill: #fff"></path>
</g></svg></li-icon>
Easy airplane
</span>
</li>
</ul>
</div>
</div>
Seeing more html would help but for the top html you could filter on class of a tag perhaps? You are trying to apply class name of target span to an a tag btw. The span with that class in the bottom html does not have a child a tag.
job-card-search__link-wrapper
I note, however, that the element does not appear in your longer html segment at the bottom.
For the top version:
links = [item['href'] for item in soup.select('a.job-card-search__link-wrapper')]
Top and bottom html provided a tags do have a shared attribute you might use
[data-control-name]