I have function for newspaper3k which extract summary for given url. Given as :-
def article_summary(row):
url = row
article = Article(url)
article.download()
article.parse()
article.nlp()
text = article.summary
return text
I have pandas dataframe with column named as url
url
https://www.xyssss.com/dddd
https://www.sbkaksbk.com/shshshs
https://www.ascbackkkc.com/asbbs
............
............
There is another function main_code() which runs perfectly fine and inside which Im using article_summary.I want to add both functions article_summary and main_code() into one function final_code.
Here is my code : 1st function as:-
def article_summary(row):
url = row
article = Article(url)
article.download()
article.parse()
article.nlp()
text = article.summary
return text
Here is 2nd Function
def main_code():
article_data['article']=article_data['url'].apply(article_summary)
return article_data['articles']
When I have done:
def final_code():
article_summary()
main_code()
But final_code() not giving any output it shows as TypeError: article_summary() missing 1 required positional argument: 'row'
Are those the actual URLs you're using? If so, they seem to be causing an ArticleException, I tested your code with some wikipedia pages and it works.
On that note, are you working with just one df? If not, it's probably a good idea to pass it as a variable to the function.
-----------------------------------Edit after comments----------------------------------------------------------------------
I think a tutorial on Python functions will be beneficial. That said, in regards to your specific question, calling a function the way you described it will make it run twice, which is not needed in this case. As I said earlier, you should pass the df as an argument to the function, here is a tutorial on global vs local variables and how to use them.
The error you're getting is because you should pass an argument 'row' to the function article_summary (please see functions tutorial).
Related
I have a dataset and I want to make a function that does the .get_dummies() so I can use it in a pipeline for specific columns.
When I run dataset = pd.get_dummies(dataset, columns=['Embarked','Sex'], drop_first=True)
alone it works, as in, when I run df.head() I can still see the dummified columns but when I have a function like this,
def dummies(df):
df = pd.get_dummies(df, columns=['Embarked','Sex'], drop_first=True)
return df
Once I run dummies(dataset) it shows me the dummified columsn in that same cell but when I try to dataset.head() it isn't dummified anymore.
What am I doing wrong?
thanks.
You should assign the result of the function to df, call the function like:
dataset=dummies(dataset)
function inside them have their own independent namespace for variable defined there either in the signature or inside
for example
a = 0
def fun(a):
a=23
return a
fun(a)
print("a is",a) #a is 0
here you might think that a will have the value 23 at the end, but that is not the case because the a inside of fun is not the same a outside, when you call fun(a) what happens is that you pass into the function a reference to the real object that is somewhere in memory so the a inside will have the same reference and thus the same value.
With a=23 you're changing what this a points to, which in this example is 23.
And with fun(a) the function itself return a value, but without this being saved somewhere that result get lost.
To update the variable outside you need to reassigned to the result of the function
a = 0
def fun(a):
a=23
return a
a = fun(a)
print("a is",a) #a is 23
which in your case it would be dataset=dummies(dataset)
If you want that your function make changes in-place to the object it receive, you can't use =, you need to use something that the object itself provide to allow modifications in place, for example
this would not work
a = []
def fun2(a):
a=[23]
return a
fun2(a)
print("a is",a) #a is []
but this would
a = []
def fun2(a):
a.append(23)
return a
fun2(a)
print("a is",a) #a is [23]
because we are using a in-place modification method that the object provided, in this example that would be the append method form list
But such modification in place can result in unforeseen result, specially if the object being modify is shared between processes, so I rather recomend the previous approach
i tried calling .copy on it and then passing it in the function. that didn't work.
when i tried coppying in the function itself it still changed the original list.
the function is in another file
main.py
win_condition.check_r_win(board)
win_condition.py
def check_r_win(board):
board = _board.copy()
for col in board:
while(len(col) <= ROWS):
col.append("-")
I don't really get what you are trying here. Python's list is a mutable object type. If you pass a object reference of a list to a function and change the list within this function, it also gets changed outside of the function scope.
I have the following function:
def test(crew):
crew1 = crew_data['CrewEquipType1']
crew2 = crew_data['CrewEquipType2']
crew3 = crew_data['CrewEquipType3']
return
test('crew1')
I would like to be able to use any one of the 3 variables as an argument and return the output accordingly to use as a reference later in my code. FYI, each of the variables above is a Pandas series from a DataFrame.
I can create functions without a parameter, but for reason I can't quite get the concept of how to use parameters effectively such as that above, instead I find myself writing individual functions rather then writing a single one and adding a parameter.
If someone could provide a solution to the above that would be greatly appreciated.
Assumption: You problem seems to be that you want to return the corresponding variable crew1, crew2 or crew3 based on your input to the function test.
Some test cases based on my understanding of your problem
test('crew1') should return crew_data['CrewEquipType1']
test('crew2') should return crew_data['CrewEquipType2']
test('crew3') should return crew_data['CrewEquipType3']
To accomplish this you can implement a function like this
def test(crew):
if crew=='crew1':
return crew_data['CrewEquipType1']
elif crew=='crew2':
return crew_data['CrewEquipType2']
elif crew=='crew3':
return crew_data['CrewEquipType3']
...
... # add as many cases you would like
...
else:
# You could handle incorrect value for `crew` parameter here
Hope this helps!
Drop a comment if not
I am writing a framework in Python. When a user declares a function, they do:
def foo(row, fetch=stuff, query=otherStuff)
def bar(row, query=stuff)
def bar2(row)
When the backend sees query= value, it executes the function with the query argument depending on value. This way the function has access to the result of something done by the backend in its scope.
Currently I build my arguments each time by checking whether query, fetch and the other items are None, and launching it with a set of args that exactly matches what the user asked for. Otherwise I got the "got an unexpected keyword argument" error. This is the code in the backend:
#fetch and query is something computed by the backend
if fetch= None and query==None:
userfunction(row)
elif fetch==None:
userunction (row, query=query)
elif query == None:
userfunction (row, fetch=fetch)
else:
userfunction (row,fetch=fetch,query=query)
This is not good; for each additional "service" the backend offers, I need to write all the combinations with the previous ones.
Instead of that I would like to primarily take the function and manually add a named parameter, before executing it, removing all the unnecessary code that does these checks. Then the user would just use the stuff it really wanted.
I don't want the user to have to modify the function by adding stuff it doesn't want (nor do I want them to specify a kwarg every time).
So I would like an example of this if this is doable, a function addNamedVar(name, function) that adds the variable name to the function function.
I want to do that that way because the users functions are called a lot of times, meaning that it would trigger me to, for example, create a dict of the named var of the function (with inspect) and then using **dict. I would really like to just modify the function once to avoid any kind of overhead.
This is indeed doable in AST and that's what I am gonna do because this solution will suit better for my use case . However you could do what I asked more simply by having a function cloning approach like the code snippet I show. Note that this code return the same functions with different defaults values. You can use this code as example to do whatever you want.
This works for python3
def copyTransform(f, name, **args):
signature=inspect.signature(f)
params= list(signature.parameters)
numberOfParam= len(params)
numberOfDefault= len(f.__defaults__)
listTuple= list(f.__defaults__)
for key,val in args.items():
toChangeIndex = params.index(key, numberOfDefault)
if toChangeIndex:
listTuple[toChangeIndex- numberOfDefault]=val
newTuple= tuple(listTuple)
oldCode=f.__code__
newCode= types.CodeType(
oldCode.co_argcount, # integer
oldCode.co_kwonlyargcount, # integer
oldCode.co_nlocals, # integer
oldCode.co_stacksize, # integer
oldCode.co_flags, # integer
oldCode.co_code, # bytes
oldCode.co_consts, # tuple
oldCode.co_names, # tuple
oldCode.co_varnames, # tuple
oldCode.co_filename, # string
name, # string
oldCode.co_firstlineno, # integer
oldCode.co_lnotab, # bytes
oldCode.co_freevars, # tuple
oldCode.co_cellvars # tuple
)
newFunction=types.FunctionType(newCode, f.__globals__, name, newTuple, f.__closure__)
newFunction.__qualname__=name #also needed for serialization
You need to do that weird stuff with the names if you want to Pickle your clone function.
I have created three functions. The first function is used in the other two functions but I am passing it a hardcoded filepath. I want to be able to pass this in as a parameter, but I seem to be getting an issue.
Essentially, given a file_path, my function will get the first item in the list and then the second item.
So far my code is as follows :
def sort_files(file_path):
"""Sort files in ascending order"""
files = os.listdir(file_path)
return sorted(files, reverse=True)
def current_day():
"""Get the current day file"""
return sort_files(file_path)[0]
def previous_day():
"""Get the previous day file"""
return sort_files(file_path)[1]
If you want a function to accept an argument, you need to define it as doing so by specifying the parameter name it will be known as in the function (as you did with sort_files).
How are you executing the call to the current_day and previous_day. You should make them as function that can take a parameter.
Also please post the code that you are using to execute the whole setup.