Pydub mix 2 sounds with different energy levels - audio

I use Pydub Library.
I want to mix 2 sounds with different decibel levels incorporating dB data (different energy level). For example, I have the sound of two species.
sound1 = AudioSegment.from_file("species_a.wav")
sound2 = AudioSegment.from_file("species_b.wav")
combined = sound1.overlay(sound2)
And I want, for example, species "a" speak up than species species "b" in my new sound "combined".
Do this with different energy level (-18 dB, -12 dB, -6dB,- 0dB).
It is possible ?
Thanks !

you can normalize them like (untested code, but it probably works)
def set_to_target_level(sound, target_level):
difference = target_level - sound.dBFS
return sound.apply_gain(difference)
sound1_adjusted = set_to_target_level(sound1, -12.0)
sound2_adjusted = set_to_target_level(sound2, -12.0)
combined = sound1_adjusted.overlay(sound2_adjusted)

Related

Annual count index from GAM looking at long-term trends by site

I'm interested in estimating a shared, global trend over time for counts monitored at several different sites using generalized additive models (gams). I've read this great introduction to hierarchical gams (hgams) by Pederson et al. (2019), and I believe I can setup the model as follows (the Pederson et al. (2019) GS model),
fit_model = gam(count ~ s(year, m = 2) + s(year, site, bs = 'fs', m = 2),
data = count_df,
family = nb(link = 'log'),
method = 'REML')
I can plot the partial effect smooths, look at the fit diagnostics, and everything looks reasonable. My question is how to extract a non-centered annual relative count index? My first thought would be to add the estimated intercept (the average count across sites at the beginning of the time series) to the s(year) smooth (the shared global smooth). But I'm not sure if the uncertainty around that smooth already incorporates uncertainty in the estimated intercept? Or if I need to add that in? All of this was possible thanks to the amazing R libraries mgcv, gratia, and dplyr.
Your way doesn't include the uncertainty in the constant term, it just shifts everything around.
If you want to do this it would be easier to use the constant argument to gratia:::draw.gam():
draw(fit_model, select = "s(year)", constant = coef(fit_model)[1L])
which does what your code does, without as much effort (on your part).
An better way — with {gratia}, seeing as you are using it already — would be to create a data frame containing a sequence of values over the range of year and then use gratia::fitted_values() to generate estimates from the model for those values of year. To get what you want (which seems to be to exclude the random smooth component of the fit, such that you are setting the random component to equal 0 on the link scale) you need to pass that smooth to the exclude argument:
## data to predict at
new_year <- with(count_df,
tibble(year = gratia::seq_min_max(year, n = 100),
site = factor(levels(site)[1], levels = levels(site)))
## predict
fv <- fitted_values(fit_model, data = new_year, exclude = "s(year,site)")
If you want to read about exclude, see ?predict.gam

Expand netcdf to the whole globe with xarray

I have a dataset that looks like this:
As you can see, it only covers Latitudes between -55.75 and 83.25. I would like to expand that dataset so that it covers the whole globe (-89.75 to 89.75 in my case) and fill it with an arbitrary NA value.
Ideally I would want to do this with xarray. I have looked at .pad(), .expand_dims() and .assign_coords(), but did not really get a handle on the working ofeither of those.
If someone can provide an alternative solution with cdo, I would also be grateful for that.
You could do this with nctoolkit (https://nctoolkit.readthedocs.io/en/latest/), which uses CDO as a backend.
The example below shows how you could do it. Example starts by cropping a global temperature dataset to latitudes between -50 and 50. You would then need to regrid it to a global dataset, at whatever resolution you need. This uses CDO, which will extrapolate at the edges. So you probably want to set everything to NA outside the original dataset's values, so my code calls masklonlatbox from CDO.
import nctoolkit as nc
ds = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE2/sst.mon.ltm.1981-2010.nc")
ds.subset(time = 0)
ds.crop(lat = [-50, 50])
ds.to_latlon(lon = [-179.5, 179.5], lat = [-89.5, 89.5], res = 1)
ds.mask_box(lon = [-179.5, 179.5], lat = [-50, 50])
ds.plot()
# convert to xarray dataset
ds_xr = ds.to_xarray()

How to add varying prefixes to strings in pandas column

I have a dataframe with 2 columns containing audio filenames and corresponding texts, looking like this:
data = {'Audio_Filename': ['3e2bd3d1-b9fc-095728a4d05b',
'8248bf61-a66d-81f33aa7212d',
'81051730-8a18-6bf476d919a4'],
'Text': ['On a trip to America, he saw people filling his noodles into paper cups.',
'When the young officers were told they were going to the front,',
'Yeah, unbelievable, I had not even thought of that.']}
df = pd.DataFrame(data, columns = ['Audio_Filename', 'Text'])
Now I want to add a string prefix (the speaker ID: sp1, sp2, sp3) with an underscore _ to all audio filename strings according to this pattern:
sp2_3e2bd3d1-b9fc-095728a4d05b.
My difficulty: The prefix/speaker ID is not fixed but varies depending on the audio filenames. Because of this, I have zipped the audio filenames and the speaker IDs and iterated over those and the audio filename rows via for-loops. This is my code:
zipped = list(zip(audio_filenames, speaker_ids))
for audio, speaker_id in zipped:
for index, row in df.iterrows():
audio_row = row['Audio_Filename']
if audio == audio_row:
df['Audio_Filename'] = f'{speaker_id}_' + audio_row
df.to_csv('/home/user/file.csv')
I also tried apply with lambda after the if statement:
df['Audio_Filename'] = df['Audio_Filename'].apply(lambda x: '{}_{}'.format(speaker_id, audio_row))
But nothing works so far.
Can anyone please give me a hint on how to do this?
The resulting dataframe should look like this:
Audio_Filename Text
sp2_3e2bd3d1-b9fc-095728a4d05b On a trip to America, he saw people filling hi...
sp1_8248bf61-a66d-81f33aa7212d When the young officers were told they were go...
sp3_81051730-8a18-6bf476d919a4 Yeah, unbelievable, I had not even thought of ...
(Of course, I have much more audio filenames and corresponding texts in the dataframe).
I appreciate any help, thank you!
If you have audio_filenames and speaker_ids list, you can use Series.map function. For example:
audio_filenames = [
"3e2bd3d1-b9fc-095728a4d05b",
"8248bf61-a66d-81f33aa7212d",
"81051730-8a18-6bf476d919a4",
]
speaker_ids = ["sp2", "sp1", "sp3"]
mapper = {k: "{}_{}".format(v, k) for k, v in zip(audio_filenames, speaker_ids)}
df["Audio_Filename"] = df["Audio_Filename"].map(mapper)
print(df)
Prints:
Audio_Filename Text
0 sp2_3e2bd3d1-b9fc-095728a4d05b On a trip to America, he saw people filling his noodles into paper cups.
1 sp1_8248bf61-a66d-81f33aa7212d When the young officers were told they were going to the front,
2 sp3_81051730-8a18-6bf476d919a4 Yeah, unbelievable, I had not even thought of that.

Geospatial fixed radius cluster hunting in python

I want to take an input of millions of lat long points (with a numerical attribute) and then find all fixed radius geospatial clusters where the sum of the attribute within the circle is above a defined threshold.
I started by using sklearn BallTree to sum the attribute within any defined circle, with the intention of then expanding this out to run across a grid or lattice of circles. The run time for one circle is around 0.01s, so this is fine for small lattices, but won't scale if I want to run 200m radius circles across the whole of the UK.
#example data (use 2m rows from postcode centroid file)
df = pandas.read_csv('National_Statistics_Postcode_Lookup_Latest_Centroids.csv', usecols=[0,1], nrows=2000000)
#this will be our grid of points (or lattice) use points from same file for example
df2 = pandas.read_csv('National_Statistics_Postcode_Lookup_Latest_Centroids.csv', usecols=[0,1], nrows=2000)
#reorder lat long columns for balltree input
columnTitles=["Y","X"]
df = df.reindex(columns=columnTitles)
df2 = df2.reindex(columns=columnTitles)
# assign new columns to existing dataframe. attribute will hold the data we want to sum over (set to 1 for now)
df['attribute'] = 1
df2['aggregation'] = 0
RADIANT_TO_KM_CONSTANT = 6367
class BallTreeIndex:
def __init__(self, lat_longs):
self.lat_longs = np.radians(lat_longs)
self.ball_tree_index =BallTree(self.lat_longs, metric='haversine')
def query_radius(self,query,radius):
radius_km = radius/1000
radius_radiant = radius_km / RADIANT_TO_KM_CONSTANT
query = np.radians(np.array([query]))
indices = self.ball_tree_index.query_radius(query,r=radius_radiant)
return indices[0]
#index the base data
a=BallTreeIndex(df.iloc[:,0:2])
#begin to loop over the lattice to test performance
for i in range(0,100):
b = df2.iloc[i,0:2]
output = a.query_radius(b, 200)
accumulation = sum(df.iloc[output, 2])
df2.iloc[i,2] = accumulation
It feels as if the above code is really inefficient as I don't need to run the calculation across all circles on my lattice (as most will be well below my threshold - or will have no data points in at all).
Instead of this for loop, is there a better way of scaling this algorithm to give me the most dense circles?
I'm new to python, so any help would be massively appreciated!!
First don't try to do this on a sphere! GB is small and we have a well defined geographic projection that will work. So use the oseast1m and osnorth1m columns as X and Y. They are in metres so no need to convert (roughly) to degrees and use Haversine. That should help.
Next add a spatial index to speed up lookups.
If you need more speed there are various tricks like loading a 2R strip across the country into memory and then running your circles across that strip, then moving down a grid step and updating that strip (checking Y values against a fixed value is quick, especially if you store the data sorted on Y then X value). If you need more speed then look at any of the papers the Stan Openshaw (and sometimes I) wrote about parallelising the GAM. There are examples of implementing GAM in python (e.g. this paper, this paper) that may also point to better ways.

Modelling a heat pump in oemof (solph)

How can I model a heat pump in the oemof. I think it is necessary to create three buses (low temperature reservoir, electricity, high temperature). But the LinearTransformer class does not allow more than one input. Is there another way to do it?
I would like to set an oemof tag but I am not allowed to do so.
It depends on which oemof version you use. If you use oemof < v0.1.2 you have to model it with just two buses. You can calculate the COP in advance using the temperature of the reservoir and the average temperature of the heat bus. You can pass it as a list, numpy.array, pandas.Series etc..
from oemof import solph
cop = [2.5, 2.3, 2.5] # length = number of time steps
solph.LinearTransformer(
label="pp_gas",
inputs={electricity_bus: solph.Flow()},
outputs={heat_bus: solph.Flow(nominal_value=maximum_output)},
conversion_factors={electricity_bus: cop})
With oemof >= v0.1.2 you can use two or three buses. But think hard if gain an extra value by using a third bus.
from oemof import solph
b_el = solph.Bus(label='electricity')
b_th_low = solph.Bus(label='low_temp_heat')
b_th_high = solph.Bus(label='high_temp_heat')
cop = 3 # coefficient of performance of the heat pump
solph.LinearN1Transformer(
label='heat_pump',
inputs={bus_elec: Flow(), bus_low_temp_heat: Flow()},
outputs={bus_th_high: Flow()},
conversion_factors={bus_elec: cop,
b_th_low: cop/(cop-1)})

Resources