Using mplayer to extract icy metadata from audio stream (continuously) - audio

I need to extract the ICY metadata from a live audio stream and was looking at doing this using mplayer as this outputs the metadata as it plays the audio stream. I'm open to other ways of doing this, the goal is to have the updated metadata (song info) saved to a text file that will update whenever the song (or data) changes.
One of the reasons I want to use mplayer is to ensure it works on the most diverse streams available (rather than just Shoutcast/Icecast).
I am able to extract the metadata now using this simple line : mplayer http://streamurl
The problem is that I do not want to keep calling it every x seconds as it fills up the destination server logs with x second calls (connect/disconnect).
I'd rather have it permanently connected to the stream and use the output of mplayer to output the icy metadata whenever the song updates.
The reason I do not want to just connect every x seconds is because I need quite a bit of granularity and would be checking every 10-15 seconds for an update.
I'd be happy to do this a different way, but would ultimately need the data outputted to a .txt file somehow.
Any pointers on how to achieve this would be greatly appreciated.

What I did was to run it in a thread and capture its output. that way, you can do whatever you want with it: call a function to update a variable, for example.
For example:
class Radio:
radio = None
stream_text = None
t1 = None
def __init__(self, radio):
self.radio = radio
def getText(self):
if self.stream_text:
return self.stream_text
return ""
def setURL (self,radio):
self.radio = radio
def run(self):
self.t1 = threading.Thread(target=self.start)
self.t1.start()
def start(self):
self.p= subprocess.Popen(['mplayer','-slave','-quiet', self.radio], stdin=subprocess.PIPE, stdout=subprocess.PIPE,universal_newlines=True, bufsize = 1)
for line in self.p.stdout:
if line.encode('utf-8').startswith(b'ICY Info:'):
info = line.split(':', 1)[1].strip()
attrs = dict(re.findall("(\w+)='([^']*)'", info))
self.stream_text = attrs.get('StreamTitle', '(none)')
By calling getText() every second, I'd get up-to-date info, but of course instead of doing it this way, you can send a call back function to be executed with every new update.

Related

GoogleCloud Speech2Text "long_running_recognize" response object un-iterable

When running a speech to text api request from Google cloud services (over 60s audio so i need to use the long_running_recognize function, as well as retrieve the audio from a Cloud Storage Bucket), i properly get a text response, but i cannot iterate through the LongRunningResponse object that is returned, which renders the info inside semi useless.
When using just the "client.recognize()" function, i get a similar response to the long running response, except when i check for the results in the short form, i can iterate through the object just fine, contrary to the long response.
I run nearly identical parameters through each recognize function (a 1m40s long audio for long running, and a 30s for the short recognize, both from my cloud bucket).
short_response = client.recognize(config=config, audio=audio_uri)
subs_list = []
for result in short_response.results:
for alternative in result.alternatives:
for word in alternative.words:
if not word.start_time:
start = 0
else:
start = word.start_time.total_seconds()
end = word.end_time.total_seconds()
t = word.word
subs_list.append( ((float(start),float(end)), t) )
print(subs_list)
Above function works fine, the ".results" attribute correctly returns objects that i can further gain attributes from and iterate through. I use the for loops to create subtitles for a video.
I then try a similar thing on the long_running_recognize, and get this:
long_response = client.long_running_recognize(config=config, audio=audio_uri)
#1
print(long_response.results)
#2
print(long_response.result())
Output from #1 returns error:
AttributeError: 'Operation' object has no attribute 'results'. Did you mean: 'result'?
Output from #2 returns the info i need, but when checking "type(long_response.result())" i get:
<class 'google.cloud.speech_v1.types.cloud_speech.LongRunningRecognizeResponse'>
Which i suppose is not an iterable object, and i cannot figure out how to apply a similar process as i do to the recognize function to gain subtitles the way i need.

How to stop/close a Twitter API Filtered Stream after x seconds?

I just wanted to know how to stop a Twitter API Filtered Steam after x seconds.
I have got this so far:
t_end = time.time() + 10
while time.time() < t_end:
bearer_token = config.BEARER_TOKEN
headers = create_headers(bearer_token)
rules = get_rules(headers, bearer_token)
delete = delete_all_rules(headers, bearer_token, rules)
set = set_rules(headers, delete, bearer_token)
get_stream(headers, set, bearer_token)
print("ended")
However, the stream just continues. I cannot seem to find in the Twitter API documentation how to close a stream or ways to close a stream.
By it is meaning, streaming is intended to transmit and recieve data in a continuous flow. However, in python you can use libraries to stop it.
One way is to use threading, you start to count the time while you are streaming and abort the python file after some time:
# after building your arguments like bearer token and rules
def manage_streaming():
thread1 = threading.Thread(target=get_stream, args=(headers, set, bearer_token,))
thread1.start()
sleep(time_2_end_in_seconds)
print('ended')
break # or you can use sys.exit() directly here to stop the
# executing file.

Generating a 2-channel wave file from two independent streams of audio data

I am streaming audio data from two clients on a network into a common server software, that needs to take said audio data and combine it into a two-channel wave file. Both client and server are software written by me.
I'm struggling with how to combine this on the server-side, and a key metric in the output wave file is being able to recreate the timestamps with which the users talked. What I want to do is output each client (there are only ever 2 per wave file) into a 2-channel stereo wave file.
How do I handle a situation like this properly? Do the clients need to change to stream audio data differently? Also, what do you recommend as an approach for dealing with the pauses in the audio stream i.e capturing the delays between users pressing the push-to-talk button when no messages are coming to the server?
Currently, the client software is using pyaudio to record from the default input device and issending individual frames over the network using TCP/IP. One message per frame. The clients work in a push-to-talk fashion, and only send audio data when the push-to-talk button is being held, otherwise no messages are sent.
I've done a decent bit of research into the WAVE file format and I understand that to do this I will need to interleave the samples from each channel for every frame written, which is where my main source of confusion comes from. Due to the dynamic nature of this environment as well as the synchronous approach of processing the audio data on the server side, most of the time I won't have data from both clients at once, but if I do I won't have a good logical mechanism to tell the server to write both frames together at once.
Here is what I have so far for processing audio from clients. One instance of this class is created for each client and thus a separate wave file is created for every client, which isn't what I want.
class AudioRepository(object):
def __init__(self, root_directory, test_id, player_id):
self.test_id = test_id
self.player_id = player_id
self.audio_filepath = os.path.join(root_directory, "{0}_{1}_voice_chat.wav".format(test_id, player_id))
self.audio_wave_writer = wave.open(self.audio_filepath, "wb")
self.audio_wave_writer.setnchannels(1)
self.audio_wave_writer.setframerate(44100)
self.audio_wave_writer.setsampwidth(
pyaudio.get_sample_size(pyaudio.paInt16))
self.first_audio_record = True
self.previous_audio_time = datetime.datetime.now()
def write(self, record: Record):
now = datetime.datetime.now()
time_passed_since_last = now - self.previous_audio_time
number_blank_frames = int(44100 * time_passed_since_last.total_seconds())
blank_data = b"\0\0" * number_blank_frames
if not self.first_audio_record and time_passed_since_last.total_seconds() >= 1:
self.audio_wave_writer.writeframes(blank_data)
else:
self.first_audio_record = False
self.audio_wave_writer.writeframes(
record.additional_data["audio_data"])
self.previous_audio_time = datetime.datetime.now()
def close(self):
self.audio_wave_writer.close()
I typed this up because the code is on a machine without internet access, so sorry if the formatting is messed up and/or typos.
This also demonstrates what i'm currently doing to handle the time in between transmissions which works moderately well. The rate limiting thing is a hack and does cause problems, but I think i have a real solution for that. The clients send messages when the users presses and releases the push to talk button, so I can use those as flags to pause the output of blank frames so long as the user is sending me real audio data (which was the real problem, when users were sending audio data I was putting in a bunch of little tiny pauses which made the audio choppy).
The expected solution is to make the code above no longer be tied down to a single player id, and instead write will be called with records from both clients of the server (but still will be one from each player individually, not together) and interleave the audio data from each into a 2-channel wave file, with each player on a separate channel. I'm just looking for suggestions on how to handle the details of this. My initial thoughts are a thread and two queues of audio frames from each client will need to be involved, but i'm still iffy on how to combine it all into the wave file and make it sound proper and be timed right.
I managed to solve this using pydub, posting my solution here in case someone else stumbles upon this. I overcame the problem of keeping accurate timestamps using silence as mentioned in the original post, by tracking the transmission start and end events that the client software was already sending.
class AudioRepository(Repository):
def __init__(self, test_id, board_sequence):
Repository.__init__(self, test_id, board_sequence)
self.audio_filepath = os.path.join(self.repository_directory, "{0}_voice_chat.wav".format(test_id))
self.player1_audio_segment = AudioSegment.empty()
self.player2_audio_segment = AudioSegment.empty()
self.player1_id = None
self.player2_id = None
self.player1_last_record_time = datetime.datetime.now()
self.player2_last_record_time = datetime.datetime.now()
def write_record(self, record: Record):
player_id = record.additional_data["player_id"]
if record.event_type == Record.VOICE_TRANSMISSION_START:
if self.is_player1(player_id):
time_elapsed = datetime.datetime.now() - self.player1_last_record_time
segment = AudioSegment.silent(time_elapsed.total_seconds() * 1000)
self.player1_audio_segment += segment
elif self.is_player2(player_id):
time_elapsed = datetime.datetime.now() - self.player2_last_record_time
segment = AudioSegment.silent(time_elapsed.total_seconds() * 1000)
self.player2_audio_segment += segment
elif record.event_type == Record.VOICE_TRANSMISSION_END:
if self.is_player1(player_id):
self.player1_last_record_time = datetime.datetime.now()
elif self.is_player2(player_id):
self.player2_last_record_time = datetime.datetime.now()
if not record.event_type == Record.VOICE_MESSAGE_SENT:
return
frame_data = record.additional_data["audio_data"]
segment = AudioSegment(data=frame_data, sample_width=2, frame_rate=44100, channels=1)
if self.is_player1(player_id):
self.player1_audio_segment += segment
elif self.is_player2(player_id):
self.player2_audio_segment += segment
def close(self):
Repository.close(self)
# pydub's AudioSegment.from_mono_audiosegments expects all the segments given to be of the same frame count.
# To ensure this, we check each segment's length and pad with silence as necessary.
player1_frames = self.player1_audio_segment.frame_count()
player2_frames = self.player2_audio_segment.frame_count()
frames_needed = abs(player1_frames - player2_frames)
duration = frames_needed / 44100
padding = AudioSegment.silent(duration * 1000, frame_rate=44100)
if player1_frames > player2_frames:
self.player2_audio_segment += padding
elif player1_frames < player2_frames:
self.player1_audio_segment += padding
stereo_segment = AudioSegment.from_mono_audiosegments(self.player1_audio_segment, self.player2_audio_segment)
stereo_segment.export(self.audio_filepath, format="wav")
This way I keep the two audio segments as independent audio segments throughout the session, and combine them into one stereo segment that is then exported to the wav file of the repository. pydub also made keeping track of the silent segments easier, because I still don't think I really understand how audio "frames" work and how to generate the right amount of frames for a specific duration of silence. Nonetheless, pydub certainly does and takes care of it for me!

Proper way to start a Trio server that manages multiple TCP Connections

I recently finished a project using a mix of Django and Twisted and realized it's overkill for what I need which is basically just a way for my servers to communicate via TCP sockets. I turned to Trio and so far I'm liking what I see as it's way more direct (for what I need). That said though, I just wanted to be sure I was doing this the right way.
I followed the tutorial which taught the basics but I need a server that could handle multiple clients at once. To this end, I came up with the following code
import trio
from itertools import count
PORT = 12345
BUFSIZE = 16384
CONNECTION_COUNTER = count()
class ServerProtocol:
def __init__(self, server_stream):
self.ident = next(CONNECTION_COUNTER)
self.stream = server_stream
async def listen(self):
while True:
data = await self.stream.receive_some(BUFSIZE)
if data:
print('{} Received\t {}'.format(self.ident, data))
# Process data here
class Server:
def __init__(self):
self.protocols = []
async def receive_connection(self, server_stream):
sp: ServerProtocol = ServerProtocol(server_stream)
self.protocols.append(sp)
await sp.listen()
async def main():
await trio.serve_tcp(Server().receive_connection, PORT)
trio.run(main)
My issue here seems to be that each ServerProtocol runs listen on every cycle instead of waiting for data to be available to be received.
I get the feeling I'm using Trio wrong in which case, is there a Trio best practices that I'm missing?
Your overall structure looks fine to me. The issue that jumps out at me is:
while True:
data = await self.stream.receive_some(BUFSIZE)
if data:
print('{} Received\t {}'.format(self.ident, data))
# Process data here
The guarantee that receive_some makes is: if the other side has closed the connection already, then it immediately returns an empty byte-string. Otherwise, it waits until there is some data to return, and then returns it as a non-empty byte-string.
So your code should work fine... until the other end closes the connection. Then it starts doing an infinite loop, where it keeps checking for data, getting an empty byte-string back (data = b""), so the if data: ... block doesn't run, and it immediately loops around to do it again.
One way to fix this would be (last 3 lines are new):
while True:
data = await self.stream.receive_some(BUFSIZE)
if data:
print('{} Received\t {}'.format(self.ident, data))
# Process data here
else:
# Other side has gone away
break

python3/logging: Optionally write to more than one stream

I'm successfully using the logging module in my python3 program to send log messages to a log file, for example, /var/log/myprogram.log. In certain cases, I want a subset of those messages to also go to stdout, with them formatted through my logging.Logger instance in the same way that they are formatted when they go to the log file.
Assuming that my logger instance is called loginstance, I'd like to put some sort of wrapper around loginstance.log(level, msg) to let me choose whether the message only goes to /var/log/myprogram.log, or whether it goes there and also to stdout, as follows:
# Assume `loginstance` has already been instantiated
# as a global, and that it knows to send logging info
# to `/var/log/myprogram.log` by default.
def mylogger(level, msg, with_stdout=False):
if with_stdout:
# Somehow send `msg` through `loginstance` so
# that it goes BOTH to `/var/log/myprogram.log`
# AND to `stdout`, with identical formatting.
else:
# Send only to `/var/log/myprogram.log` by default.
loginstance.log(level, msg)
I'd like to manage this with one, single logging.Logger instance, so that if I want to change the format or other logging behavior, I only have to do this in one place.
I'm guessing that this involves subclassing logging.Logger and/or logging.Formatter, but I haven't figured out how to do this.
Thank you in advance for any suggestions.
I figured out how to do it. It simply requires that I use a FileHandler subclass and pass an extra argument to log() ...
class MyFileHandler(logging.FileHandler):
def emit(self, record):
super().emit(record)
also_use_stdout = getattr(record, 'also_use_stdout', False)
if also_use_stdout:
savestream = self.stream
self.stream = sys.stdout
try:
super().emit(record)
finally:
self.stream = savestream
When instantiating my logger, I do this ...
logger = logging.getLogger('myprogram')
logger.addHandler(MyFileHandler('/var/log/myprogram.log'))
Then, the mylogger function that I described above will look like this:
def mylogger(level, msg, with_stdout=False):
loginstance.log(level, msg, extra={'also_use_stdout': with_stdout})
This works because anything passed to the log function within the optional extra dictionary becomes an attribute of the record object that ultimately gets passed to emit.

Resources