I'm trying to pass argument inside function but no successes.
the purpose of this function is to return xml tag
this code doesn't work:
from bs4 import BeautifulSoup
def xmlTag(message):
conf = open('timeLimit.conf').read().lower()
for config in conf.splitlines():
if config in conf.splitlines():
data = BeautifulSoup(conf, "lxml")
tag = data.message
print(tag['msg'])
break
xmlTag("fun2")
if i put fun2 instead of "message" variable, like this "tag = data.fun2" the code works
please help
what i"m doing wrong
Try doing:
...
tag = getattr(data, message)
...
getattr is the way of retrieving an attribute from an object when you have its name in a variable.
(Though your code have some other issues as well - that break statement where it is ensures your loop will terminate on the first iteration, for example)
Related
1) In my response body comes like json format.
2) Some expected reason ,i have changed that body json to normal text using below code and working expected way
import groovy.json.*
String js = vars.get("cAccountDetails")
def data = new JsonSlurper().parseText(js)
log.info("the value is "+ data)
vars.putObject('data', data)
3) This code meaning converted json to normal text and stored in some variable thats "data"
4) so my response stored in "data" variable .
5) From "data", how can i extract **specific data** using groovy code or some other code?
import java.util.regex.*
import java.util.regex.Matcher
import java.util.regex.Pattern
def matches = (data =~ '{accountDetails=\\[(.*)\\],')
vars.putObject('matches', matches)
The above code using for correlation purpose {"matches" VARIABLE will store extracted value}
but above code is not working ,how can i fix this issue ?
Thanks in advance!!
We cannot help you unless you share your cAccountDetails variable value and indicate what do you need to extract from it.
From the first glance you regular expression should look a little bit different, i.e.
def matches = (data =~ /accountDetails=[(.*)],/)
More information:
Apache Groovy - Find Operator
Apache Groovy - Why and How You Should Use It
So I am quite new to mocking. I think I need to mock two functions.
Function under test
def get_text_from_pdf(next_pdfs_path):
# TODO test this function
"""
pulls all text from a PDF document and returns as a string.
Parameters:
next_pdfs_path (str): file path use r"" around path.
Returns:
text (str): string of text
"""
if os.path.isfile(next_pdfs_path): # check file is a real file/filepath
try:
text = ''
with fitz.open(next_pdfs_path) as doc: # using PyMuPDF
for page in doc:
text += page.getText()
return text
except (RuntimeError, IOError):
pass
pass
test code first try
from unittest import mock
#mock.patch("content_production.fitz.open", return_value='fake_file.csv', autospec=True)
def test_get_text_from_pdf(mock_fitz_open):
assert cp.get_text_from_pdf('fake_file.csv') == 'fake_file.csv'
error
E AssertionError: assert None == 'fake_file.csv'
E + where None = <function get_text_from_pdf at 0x00000245EDF8CAF0>('fake_file.csv')
E + where <function get_text_from_pdf at 0x00000245EDF8CAF0> = cp.get_text_from_pdf
Do I need to mock both fitz.open and os.path.isfile? How could that be done if yes?
EDIT
Following njriasan feedback I have tried this
#mock.patch("content_production.os.path.isfile", return_value=True, autospec=True)
#mock.patch("content_production.fitz.Page.getText")
#mock.patch("content_production.fitz.open")
def test_get_text_from_pdf(mock_fitz_open, mock_path_isfile, mock_gettext):
mock_fitz_open.return_value.__enter__.return_value = 'test'
assert cp.get_text_from_pdf('test') == 'test'
But now getting this error.
> text += page.getText()
E AttributeError: 'str' object has no attribute 'getText'
I think there are a couple issues with what you are doing. The first problem I see is that I think you are mocking the wrong function. By mocking fitz.open(next_pdfs_path) you are still expecting:
for page in doc:
text += page.getText()
to execute properly. I'd suggest that you wrap this entire with statement and text result updating in a helper function and then mock that. If the file path doesn't actually exist on your system then you will also need to mock os.path.isfile. I believe that can be done by adding a second decorator (I don't think there is any limit).
I am in the middle of learning scrappy right now and am building a simple scraper of a real estate site. With this code I am trying to scrape all of the URLs for the real estate listing of a specific city. I have run into the following error with my code - "Cannot mix str and non-str arguments".
I believe I have isolated my problem to following part of my code
props = response.xpath('//div[#class = "address ellipsis"]/a/#href').extract()
If I use the extract_first() function instead of the extract function in the props xpath assignment, the code kind of works. It grabs the first link for the property on each page. However, this ultimately is not what I want. I believe I have the xpath call correct as the code runs if I use the extract_first() method.
Can someone explain what I am doing wrong here? I have listed my full code below
import scrapy
from scrapy.http import Request
class AdvancedSpider(scrapy.Spider):
name = 'advanced'
allowed_domains = ['www.realtor.com']
start_urls = ['http://www.realtor.com/realestateandhomes-search/Houston_TX/']
def parse(self, response):
props = response.xpath('//div[#class = "address ellipsis"]/a/#href').extract()
for prop in props:
absolute_url = response.urljoin(props)
yield Request(absolute_url, callback=self.parse_props)
next_page_url = response.xpath('//a[#class = "next"]/#href').extract_first()
absolute_next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(absolute_next_page_url)
def parse_props(self, response):
pass
Please let me know if I can clarify anything.
You are passing props list of strings to response.urljoin() but meant prop instead:
for prop in props:
absolute_url = response.urljoin(prop)
Alecxe's is right, it was a simple oversight in the spelling of iterator in your loop. You can use the following notation:
for prop in response.xpath('//div[#class = "address ellipsis"]/a/#href').extract():
yield scrapy.Request(response.urljoin(prop), callback=self.parse_props)
It's cleaner and you're not instantiating the "absolute_url" per loop. On a larger scale, would help you save some memory.
I have function for newspaper3k which extract summary for given url. Given as :-
def article_summary(row):
url = row
article = Article(url)
article.download()
article.parse()
article.nlp()
text = article.summary
return text
I have pandas dataframe with column named as url
url
https://www.xyssss.com/dddd
https://www.sbkaksbk.com/shshshs
https://www.ascbackkkc.com/asbbs
............
............
There is another function main_code() which runs perfectly fine and inside which Im using article_summary.I want to add both functions article_summary and main_code() into one function final_code.
Here is my code : 1st function as:-
def article_summary(row):
url = row
article = Article(url)
article.download()
article.parse()
article.nlp()
text = article.summary
return text
Here is 2nd Function
def main_code():
article_data['article']=article_data['url'].apply(article_summary)
return article_data['articles']
When I have done:
def final_code():
article_summary()
main_code()
But final_code() not giving any output it shows as TypeError: article_summary() missing 1 required positional argument: 'row'
Are those the actual URLs you're using? If so, they seem to be causing an ArticleException, I tested your code with some wikipedia pages and it works.
On that note, are you working with just one df? If not, it's probably a good idea to pass it as a variable to the function.
-----------------------------------Edit after comments----------------------------------------------------------------------
I think a tutorial on Python functions will be beneficial. That said, in regards to your specific question, calling a function the way you described it will make it run twice, which is not needed in this case. As I said earlier, you should pass the df as an argument to the function, here is a tutorial on global vs local variables and how to use them.
The error you're getting is because you should pass an argument 'row' to the function article_summary (please see functions tutorial).
Ok so I have two files, filename1.py and filename2.py and they both have a function with same name funB. The third file process.py has function that calls function from either files. I seem to be struggling in calling the correct function.
In process.py:
from directoryA.filename1 import funB
from directoryA.filename2 import funB
def funA:
#do stuff to determine which filename and save it in variable named 'd'
d = 'filename2'
# here i want to call funB with *args based on what 'd' is
So i have tried eval() like so:
call_right_funB = eval(d.funB(*args))
but it seems not to work.
Any help is appreciated.
The problem is, you can't use eval() with a combination of a string and a method like that. What you have written is:
call_right_funB = eval('filename'.funB(*args))
What you can do is:
call_right_funB = eval(d + '.funB(*args)')
But this is not very pythonic approach.
I would recommend creating a dictionary switch. Even though you have to import entire module:
import directoryA.filename1
import directoryA.filename2
dic_switch = {1: directoryA.filename1, 2: directoryA.filename2}
switch_variable = 1
call_right_funB = dic_switch[switch_variable].funB(*args)
Hope it helps.