When running a speech to text api request from Google cloud services (over 60s audio so i need to use the long_running_recognize function, as well as retrieve the audio from a Cloud Storage Bucket), i properly get a text response, but i cannot iterate through the LongRunningResponse object that is returned, which renders the info inside semi useless.
When using just the "client.recognize()" function, i get a similar response to the long running response, except when i check for the results in the short form, i can iterate through the object just fine, contrary to the long response.
I run nearly identical parameters through each recognize function (a 1m40s long audio for long running, and a 30s for the short recognize, both from my cloud bucket).
short_response = client.recognize(config=config, audio=audio_uri)
subs_list = []
for result in short_response.results:
for alternative in result.alternatives:
for word in alternative.words:
if not word.start_time:
start = 0
else:
start = word.start_time.total_seconds()
end = word.end_time.total_seconds()
t = word.word
subs_list.append( ((float(start),float(end)), t) )
print(subs_list)
Above function works fine, the ".results" attribute correctly returns objects that i can further gain attributes from and iterate through. I use the for loops to create subtitles for a video.
I then try a similar thing on the long_running_recognize, and get this:
long_response = client.long_running_recognize(config=config, audio=audio_uri)
#1
print(long_response.results)
#2
print(long_response.result())
Output from #1 returns error:
AttributeError: 'Operation' object has no attribute 'results'. Did you mean: 'result'?
Output from #2 returns the info i need, but when checking "type(long_response.result())" i get:
<class 'google.cloud.speech_v1.types.cloud_speech.LongRunningRecognizeResponse'>
Which i suppose is not an iterable object, and i cannot figure out how to apply a similar process as i do to the recognize function to gain subtitles the way i need.
Related
I have this issue.
I have a list of youtube channels I am polling from the API to get some stats daily.
Total comments, likes and dislikes (all time and all videos)
I have implemented the below, it works, but it loops through every single video one at a time, hitting the API.
Is there a way to make one API call with several video IDs?
Or is there a better way to do this and get these stats?
#find stats for all channel videos - how will this scale?
def video_stats(row):
videoid = row['video_id']
query = yt.get_video_metadata(videoid)
vids = pd.DataFrame(query, index=[0])
df['views'] = vids['video_view_count'].sum()
df['comments'] = vids['video_comment_count'].sum()
df['likes'] = vids['video_like_count'].sum()
df['dislikes'] = vids['video_dislike_count'].sum()
return 'no'
df['stats'] = df.apply(video_stats, axis = 1)
channel['views'] = df['views'].sum()
channel['comments'] = df['comments'].sum()
channel['likes'] = df['likes'].sum()
channel['dislikes'] = df['dislikes'].sum()
According to the docs, you may cumulate in one Videos.list API endpoint call the IDs of several different videos:
id: string
The id parameter specifies a comma-separated list of the YouTube video ID(s) for the resource(s) that are being retrieved. In a video resource, the id property specifies the video's ID.
However, the code you have shown is too terse for to figure out a way of adapting it to such type of (batch) endpoint call.
I need to extract the ICY metadata from a live audio stream and was looking at doing this using mplayer as this outputs the metadata as it plays the audio stream. I'm open to other ways of doing this, the goal is to have the updated metadata (song info) saved to a text file that will update whenever the song (or data) changes.
One of the reasons I want to use mplayer is to ensure it works on the most diverse streams available (rather than just Shoutcast/Icecast).
I am able to extract the metadata now using this simple line : mplayer http://streamurl
The problem is that I do not want to keep calling it every x seconds as it fills up the destination server logs with x second calls (connect/disconnect).
I'd rather have it permanently connected to the stream and use the output of mplayer to output the icy metadata whenever the song updates.
The reason I do not want to just connect every x seconds is because I need quite a bit of granularity and would be checking every 10-15 seconds for an update.
I'd be happy to do this a different way, but would ultimately need the data outputted to a .txt file somehow.
Any pointers on how to achieve this would be greatly appreciated.
What I did was to run it in a thread and capture its output. that way, you can do whatever you want with it: call a function to update a variable, for example.
For example:
class Radio:
radio = None
stream_text = None
t1 = None
def __init__(self, radio):
self.radio = radio
def getText(self):
if self.stream_text:
return self.stream_text
return ""
def setURL (self,radio):
self.radio = radio
def run(self):
self.t1 = threading.Thread(target=self.start)
self.t1.start()
def start(self):
self.p= subprocess.Popen(['mplayer','-slave','-quiet', self.radio], stdin=subprocess.PIPE, stdout=subprocess.PIPE,universal_newlines=True, bufsize = 1)
for line in self.p.stdout:
if line.encode('utf-8').startswith(b'ICY Info:'):
info = line.split(':', 1)[1].strip()
attrs = dict(re.findall("(\w+)='([^']*)'", info))
self.stream_text = attrs.get('StreamTitle', '(none)')
By calling getText() every second, I'd get up-to-date info, but of course instead of doing it this way, you can send a call back function to be executed with every new update.
I am trying to create SpaCy pipeline component to return Spans of meaningful text (my corpus comprises pdf documents that have a lot of garbage that I am not interested in - tables, headers, etc.)
More specifically I am trying to create a function that:
takes a doc object as an argument
iterates over the doc tokens
When certain rules are met, yield a Span object
Note I would also be happy with returning a list([span_obj1, span_obj2])
What is the best way to do something like this? I am a bit confused on the difference between a pipeline component and an extension attribute.
So far I have tried:
nlp = English()
Doc.set_extension('chunks', method=iQ_chunker)
####
raw_text = get_test_doc()
doc = nlp(raw_text)
print(type(doc._.chunks))
>>> <class 'functools.partial'>
iQ_chunker is a method that does what I explain above and it returns a list of Span objects
this is not the results I expect as the function I pass in as method returns a list.
I imagine you're getting a functools partial back because you are accessing chunks as an attribute, despite having passed it in as an argument for method. If you want spaCy to intervene and call the method for you when you access something as an attribute, it needs to be
Doc.set_extension('chunks', getter=iQ_chunker)
Please see the Doc documentation for more details.
However, if you are planning to compute this attribute for every single document, I think you should make it part of your pipeline instead. Here is some simple sample code that does it both ways.
import spacy
from spacy.tokens import Doc
def chunk_getter(doc):
# the getter is called when we access _.extension_1,
# so the computation is done at access time
# also, because this is a getter,
# we need to return the actual result of the computation
first_half = doc[0:len(doc)//2]
secod_half = doc[len(doc)//2:len(doc)]
return [first_half, secod_half]
def write_chunks(doc):
# this pipeline component is called as part of the spacy pipeline,
# so the computation is done at parse time
# because this is a pipeline component,
# we need to set our attribute value on the doc (which must be registered)
# and then return the doc itself
first_half = doc[0:len(doc)//2]
secod_half = doc[len(doc)//2:len(doc)]
doc._.extension_2 = [first_half, secod_half]
return doc
nlp = spacy.load("en_core_web_sm", disable=["tagger", "parser", "ner"])
Doc.set_extension("extension_1", getter=chunk_getter)
Doc.set_extension("extension_2", default=[])
nlp.add_pipe(write_chunks)
test_doc = nlp('I love spaCy')
print(test_doc._.extension_1)
print(test_doc._.extension_2)
This just prints [I, love spaCy] twice because it's two methods of doing the same thing, but I think making it part of your pipeline with nlp.add_pipe is the better way to do it if you expect to need this output on every document you parse.
I'm using the AMAZON.DURATION built-int slot type to get data from my custom Alexa skill. This works perfectly and converts the given duration to an ISO-8601 duration format.
For example, the slot type successfully converts "ten minutes" to PT10M and I can get this data from the request object passed to my Lambda function. However, I would also like to pass the unformatted "ten minutes" to my Lambda function, too.
Is this possible?
After much trial and error, I asked this same question in the Amazon Developer forums and the official response from Amazon is:
Hi, this isn't possible although it is an interesting request!
I ended up writing a manual conversion function in Python.
Should manually convert it. Below is the code snippet for nodejs to convert to minutes. Lets assume Alexa sends PT5M to Lambda
var time = "PT5M";
var res = time.substring(2, (time.length));
var mins;
var timelist = res.split("H");
if(timelist.length > 1){
mins = +parseInt((timelist[0]*60),10)+parseInt((timelist[1].substring(0,(timelist[1].length-1))),10);
}else{
mins = res.substring(0, (res.length-1));
}
console.log(mins);
I have a SoapUI Mock Service with multiple responses. I want to define a custom sequence for my responses and I do not necessarily use all of the responses in this particular test.
I have actually managed to get this to work in the past, but I think something changed in a newer version of the product and the feature stopped working. That was with a SOAP web service. Now I am mocking a RESTful web service and I have the same requirement to help me do my tests.
The SEQUENCE dispatch option is not what I want, because it will return all of the defined responses in the order in which they were created. The SCRIPT option is what I used previously but now all I can achieve with this is define one response to be generated. For this test I have no interest in examining some content of the request to decide which response to send back.
For example, if I have 8 responses defined, I just want to be able to specify that the following responses are returned:-
Response #2, then Response #3, then Response #4, then finally Response #7; so that Responses #1, #5, #6 and #8 are not used.
My question is posed in detail in the SmartBear forum here:-
simple scripting in a Mock Service - no longer works
I try as you post in the SOAPUI forum using consecutive returns statements with the response order an it doesn't work.
Instead of your groovy code as a DISPATCH script I purpose to use the follow groovy code as workaround, which consist in a use of a list to keep your responses in the desired order and keeping this list in the context an updating it each time using the following code:
// get the list from the context
def myRespList = context.myRespList
// if list is null or empty reinitalize it
if(!myRespList || !myRespList?.size){
// list in the desired output order using the response names that your
// create in your mockservice
myRespList = ["Response 2","Response 3","Response 4","Response 7"]
}
// take the first element from the list
def resp = myRespList.take(1)
// update the context with the list without this element
context.myRespList = myRespList.drop(1)
// return the response
log.info "-->"+resp
return resp
This code works as you expect, since context is keeping the list and each time this script returns the next response and when the list is empty it repopulate it an restart the loop again in the same order.
As an illustration when I use this mockService I get the follow script log:
EDIT
If as OP you've problems with your SOAPUI version because the returned string is between square brackets as ie: [Response 1], change the way that element is taken from the array using:
// take the first element from the list
def resp = myRespList.take(1)[0]
instead of:
// take the first element from the list
def resp = myRespList.take(1)
Note the [0].
With this change the return string will be Response 1 instead of [Response 1].
In this case the script will be:
// get the list from the context
def myRespList = context.myRespList
// if list is null or empty reinitalize it
if(!myRespList || !myRespList?.size){
// list in the desired output order using the response names that your
// create in your mockservice
myRespList = ["Response 2","Response 3","Response 4","Response 7"]
}
// take the first element from the list
def resp = myRespList.take(1)[0]
// update the context with the list without this element
context.myRespList = myRespList.drop(1)
// return the response
log.info "-->"+resp
return resp
Hope this helps,