Python - How to save spectrogram output in a text file? - python-3.x

My code calculates the spectrogram for x, y and z.
I calculate the magnitude of the three axis first, then calculate the spectrogram.
I need to take the spectrogram output and save it as one column in an array to use it as an input for a deep learning model.
This is my code:
dataset = np.loadtxt("trainingdatasetMAG.txt", delimiter=",")
X = dataset[:,0:6]
Y = dataset[:,6]
fake_size = 1415684
time = np.arange(fake_size)/1000 # 1kHz
base_freq = 2 * np.pi * 100
magnitude = dataset[:,5]
plt.title('xyz_magnitude')
ls=(plt.specgram(magnitude, Fs=1000))
This is my dataset, whose headers are (patientno, time/Msecond, x-axis, y-axis, z-axis, xyz_magnitude, label)
1,15,70,39,-970,947321,0
1,31,70,39,-970,947321,0
1,46,60,49,-960,927601,0
1,62,60,49,-960,927601,0
1,78,50,39,-960,925621,0
1,93,50,39,-960,925621,0
and this is the output of the spectrogram that needs to be more efficient
(array([[ 1.52494154e+11, 1.52811638e+11, 1.52565040e+11, ...,
1.47778892e+11, 1.46781213e+11, 1.46678951e+11],
[ 7.69589176e+10, 7.73638333e+10, 7.76935891e+10, ...,
7.48498747e+10, 7.40088248e+10, 7.40343108e+10],
[ 6.32683585e+04, 1.58170271e+06, 6.11287648e+06, ...,
5.06690834e+05, 3.31360693e+05, 7.04757400e+05],
...,
[ 7.79589127e+05, 8.09843763e+04, 2.52907491e+05, ...,
2.48520301e+05, 2.11734697e+05, 2.50917758e+05],
[ 9.41199946e+05, 4.98371406e+05, 1.29328139e+06, ...,
2.56729806e+05, 3.45253951e+05, 3.51932417e+05],
[ 4.36846676e+05, 1.24123764e+06, 9.20694394e+05, ...,
8.35807658e+04, 8.36986905e+05, 3.57807267e+04]]),
array([ 0. , 3.90625, 7.8125 , 11.71875, 15.625 ,
19.53125, 23.4375 , 27.34375, 31.25 , 35.15625,
39.0625 , 42.96875, 46.875 , 50.78125, 54.6875 ,
58.59375, 62.5 , 66.40625, 70.3125 , 74.21875,
78.125 , 82.03125, 85.9375 , 89.84375, 93.75 ,
97.65625, 101.5625 , 105.46875, 109.375 , 113.28125,
117.1875 , 121.09375, 125. , 128.90625, 132.8125 ,
136.71875, 140.625 , 144.53125, 148.4375 , 152.34375,
156.25 , 160.15625, 164.0625 , 167.96875, 171.875 ,
175.78125, 179.6875 , 183.59375, 187.5 , 191.40625,
195.3125 , 199.21875, 203.125 , 207.03125, 210.9375 ,
214.84375, 218.75 , 222.65625, 226.5625 , 230.46875,
234.375 , 238.28125, 242.1875 , 246.09375, 250. ,
253.90625, 257.8125 , 261.71875, 265.625 , 269.53125,
273.4375 , 277.34375, 281.25 , 285.15625, 289.0625 ,
292.96875, 296.875 , 300.78125, 304.6875 , 308.59375,
312.5 , 316.40625, 320.3125 , 324.21875, 328.125 ,
332.03125, 335.9375 , 339.84375, 343.75 , 347.65625,
351.5625 , 355.46875, 359.375 , 363.28125, 367.1875 ,
371.09375, 375. , 378.90625, 382.8125 , 386.71875,
390.625 , 394.53125, 398.4375 , 402.34375, 406.25 ,
410.15625, 414.0625 , 417.96875, 421.875 , 425.78125,
429.6875 , 433.59375, 437.5 , 441.40625, 445.3125 ,
449.21875, 453.125 , 457.03125, 460.9375 , 464.84375,
468.75 , 472.65625, 476.5625 , 480.46875, 484.375 ,
488.28125, 492.1875 , 496.09375, 500. ]),
array([1.28000000e-01, 2.56000000e-01, 3.84000000e-01, ...,
1.41529600e+03, 1.41542400e+03, 1.41555200e+03]),
<matplotlib.image.AxesImage object at 0x000002161A78F898>)

Matplotlib function specgram has 4 outputs :
spectrum : 2-D array
Columns are the periodograms of successive segments.
freqs : 1-D array
The frequencies corresponding to the rows in spectrum.
t : 1-D array
The times corresponding to midpoints of segments (i.e., the columns in spectrum).
im : instance of class AxesImage
From your code :
ls=plt.specgram(magnitude, Fs=1000)
So ls[0] contains the spectrum that you want to export in txt, you can write it in a file with this piece of code :
with open('spectrogram.txt', 'w+b') as ffile:
for spectros in ls[0]:
for spectro in spectros:
lline = str(spectro) + ' \t'
ffile.write(lline)
# one row written
ffile.write(' \n')
plt.
However before, ls[0] contains the power spectral density of NFFT=256 segments with 128 samples of overlap - by default -, so you'll have NFFT/2 +1 = 129 rows. So each columns contains the PSD at time T and each line contains a time series of the frequency concerned. To have the FFT at instant T slice it :
T_idx = 10
psd_ls[:,T_idx]

Related

Convert a deeply nested list to csv in Python

I wanted to write a deep nested list (within list within list) to csv, but it always collapse my list and printed with ..., which im unable to retrieve the hidden values.
List is to store frames of videos, the list goes up to 5 layer ()-len of each layer, Video (no of videos) > 8 frames(8) > width(200) > height(200) > pixel of 3 channel (3)
I tried converting the list to data frame before writing it to csv but still unable to solve this problem.
"[array([[[0.23137255, 0.26666668, 0.27058825],
[0.23921569, 0.27450982, 0.2784314 ],
[0.23529412, 0.27058825, 0.27450982],
...,
[0.25882354, 0.29411766, 0.2901961 ],
[0.25490198, 0.2901961 , 0.28627452],
[0.25490198, 0.2901961 , 0.28627452]],
[[0.20392157, 0.23921569, 0.24313726],
[0.21568628, 0.2509804 , 0.25490198],
[0.21568628, 0.2509804 , 0.25490198],
...,
[0.26666668, 0.3019608 , 0.29803923],
[0.26666668, 0.3019608 , 0.29803923],
[0.2627451 , 0.29803923, 0.29411766]],
[[0.1882353 , 0.22352941, 0.22745098],
[0.2 , 0.23529412, 0.23921569],
[0.20392157, 0.23921569, 0.24313726],
...,
[0.27450982, 0.30980393, 0.30588236],
[0.27058825, 0.30588236, 0.3019608 ],
[0.27058825, 0.30588236, 0.3019608 ]],
...,
I'd try one of the following:
dump the whole object into json:
import json
with open('my_saved_file.json', 'w+') as out_file:
out_file.write(list_of_lists_of_lists, indent=2)
What I'd try is storing all of your images as images and reference them in an index (could be csv)
import numpy as np
from PIL import Image
with open('reference.csv', 'w+') as out_csv:
out_csv.write("video, frame_set, frame1, frame2, frame3, frame4, frame5, frame6, frame7, frame8\n")
for video_no, video in enumerate(list_of_lists_of_lists):
row = [video_no]
for frame_set_no, frames in enumerate(video):
for frame_no, frame in enumerate(frames):
im = Image.fromarray(frame)
frame_name = f"{video_no}-{frame_set_no}-{frame_no}.jpeg"
row.append(frame_name)
im.save(frame_name)
out_csv.write(",".join(row) + "\n")

Using obspy.taup with variables obtained from a .txt file

I am new in Python and I am trying to carry out a code working in a module of obspy package. From a .txt file with a row with five values separate by comas (example: 40,47.698,146.9212, etc....) I need to use these values as variables in a function of the obspy module. I Will show you the code and you understand better.
from obspy.taup import TauPyModel
model = TauPyModel(model="iasp91")
archivo=open('Dato.txt', 'r')
for linea in archivo.readlines():
columna = str(linea).split(',')
print(columna[0])
print(columna[1])
print(columna[2])
print(columna[3])
print(columna[4])
archivo.close()
a=columna[0]
b=columna[1]
c=columna[2]
d=columna[3]
e=columna[4]
arrivals=model.get_pierce_points_geo(a, b, c, d, e, phase_list=('SKS',), resample=False)
arrival = arrivals[0]
print(arrival.pierce)
If I define the variables as a numeric value (example:a=408; b=47.6981; c=146.9212; etc ….) code works perfectly and it shows me that I want:
408
47.6981
146.9212
36.882277
-3.068689
C:\Users\peopl\Desktop\BO\env\lib\site-packages\obspy\taup\tau_branch.py:496: UserWarning: Resizing a TauP array inplace failed due to the existence of other references to the array, creating a new array. See Obspy #2280.
warnings.warn(msg)
[ ( 323.37738085, 0.00000000e+00, 0.00000000e+00, 408. , 47.6981 , 146.9212 )
( 323.37738085, 4.25942791e-01, 9.18383444e-05, 410. , 47.70292225, 146.9180712 )
( 323.37738085, 4.95211705e+01, 1.33680904e-02, 660. , 48.39912219, 146.45957792)
( 323.37738085, 4.30994629e+02, 3.09568047e-01, 2889. , 63.17117462, 131.25054174)
( 323.37738085, 6.19102877e+02, 7.88455257e-01, 3482.54497821, 73.50766588, 55.65029149)
( 323.37738085, 8.07211124e+02, 1.26734247e+00, 2889. , 54.05973754, 7.50927585)
( 323.37738085, 1.18868458e+03, 1.56354242e+00, 660. , 38.47340944, -2.34102958)
( 323.37738085, 1.23777981e+03, 1.57681868e+00, 410. , 37.75869395, -2.67200616)
( 323.37738085, 1.23820575e+03, 1.57691051e+00, 408. , 37.75374671, -2.67427329)
( 323.37738085, 1.28179336e+03, 1.58536568e+00, 210. , 37.29809076, -2.88171143)
( 323.37738085, 1.32180477e+03, 1.59207012e+00, 35. , 36.93652754, -3.04441779)
( 323.37738085, 1.32587993e+03, 1.59253065e+00, 20. , 36.91168346, -3.05553737)
( 323.37738085, 1.33192110e+03, 1.59307573e+00, 0. , 36.882277 , -3.068689 )]
Nevertheless, when I use the variables from .txt file the code shows this:
408
47.6981
146.9212
36.882277
-3.068689
Traceback (most recent call last):
File "pierce.py", line 20, in <module>
arrivals=model.get_pierce_points_geo(a, b, c, d, e, phase_list=('SKS',), resample=False)
File "C:\Users\peopl\Desktop\BO\env\lib\site-packages\obspy\taup\tau.py", line 784, in get_pierce_points_geo
distance_in_deg = calc_dist(source_latitude_in_deg,
File "C:\Users\peopl\Desktop\BO\env\lib\site-packages\obspy\taup\taup_geo.py", line 53, in calc_dist
return calc_dist_azi(source_latitude_in_deg, source_longitude_in_deg,
File "C:\Users\peopl\Desktop\BO\env\lib\site-packages\obspy\taup\taup_geo.py", line 86, in calc_dist_azi
g = ellipsoid.Inverse(source_latitude_in_deg,
File "C:\Users\peopl\Desktop\BO\env\lib\site-packages\geographiclib\geodesic.py", line 1035, in Inverse
a12, s12, salp1,calp1, salp2,calp2, m12, M12, M21, S12 = self._GenInverse(
File "C:\Users\peopl\Desktop\BO\env\lib\site-packages\geographiclib\geodesic.py", line 712, in _GenInverse
lon12, lon12s = Math.AngDiff(lon1, lon2)
File "C:\Users\peopl\Desktop\BO\env\lib\site-packages\geographiclib\geomath.py", line 156, in AngDiff
d, t = Math.sum(Math.AngNormalize(-x), Math.AngNormalize(y))
TypeError: bad operand type for unary -: 'str'
Numeric values of the first five rows are the same from .txt file but it seems to show a problem with 'str'. I will be pleased to you if you can help me to solve the problem. Sorry my Arcaic English and my novel status in Python.
Thank you very much and greetings to all of you.
I solved the problem thanks to another boy in the Spanish Community. I have to convert the variables to numeric values because the code read it as strings. I solved with this change:
a = float(columna[0])
b = float(columna[1])
c = float(columna[2])
d = float(columna[3])
e = float(columna[4])
Thanks for all and I hope to go on my learning (Programming with Python and writing in English).

Shade text using seaborn and matplotlib?

I have a sentence like say
25 August 2003 League of Extraordinary Gentlemen: Sean Connery is one of the all time greats I have been a fan of his since the 1950's. 25 August 2003 League of Extraordinary Gentlemen
I pass it through the openai sentiment code which gives me some neuron weights which can be equal or little greater then number of words.
Neuron weights are
[0.01258736, 0.03544582, 0.05184804, 0.05354257, 0.07339437,
0.07021661, 0.06993681, 0.06021424, 0.0601177 , 0.04100083,
0.03557627, 0.02574683, 0.02565657, 0.03435502, 0.04881989,
0.08868718, 0.06816255, 0.05957553, 0.06767794, 0.06561323,
0.06339648, 0.06271613, 0.06312297, 0.07370538, 0.08369936,
0.09008111, 0.09059132, 0.08732472, 0.08742133, 0.08792272,
0.08504769, 0.08541565, 0.09255819, 0.09240738, 0.09245031,
0.09080137, 0.08733468, 0.08705935, 0.09201239, 0.113047 ,
0.14285286, 0.15205048, 0.15249513, 0.14051639, 0.14070784,
0.14526351, 0.14548902, 0.12730363, 0.11916814, 0.11097522,
0.11390981, 0.12734678, 0.13625301, 0.13386811, 0.13413942,
0.13782364, 0.14033082, 0.14971626, 0.14988877, 0.14171578,
0.13999145, 0.1408006 , 0.1410009 , 0.13423227, 0.16819029,
0.18822579, 0.18462598, 0.18283379, 0.16304792, 0.1634682 ,
0.18733767, 0.22205424, 0.22615907, 0.22679318, 0.2353312 ,
0.24562076, 0.24771859, 0.24478345, 0.25780812, 0.25183237,
0.24660441, 0.2522405 , 0.26310056, 0.26156184, 0.26127928,
0.26154354, 0.2380443 , 0.2447366 , 0.24580643, 0.22959644,
0.23065038, 0.228564 , 0.23980206, 0.23410076, 0.40933537,
0.436683 , 0.5319608 , 0.5273239 , 0.54030097, 0.55781454,
0.5665511 , 0.58764166, 0.58651507, 0.5870301 , 0.5893866 ,
0.58905166, 0.58955604, 0.5872186 , 0.58744675, 0.58569545,
0.58279306, 0.58205146, 0.6251827 , 0.6278348 , 0.63121724,
0.7156403 , 0.715524 , 0.714875 , 0.71317464, 0.7630029 ,
0.75933087, 0.7571995 , 0.7563375 , 0.7583521 , 0.75923103,
0.8155783 , 0.8082132 , 0.8096348 , 0.8114364 , 0.82923543,
0.8229595 , 0.8196689 , 0.8070393 , 0.808637 , 0.82305557,
0.82719535, 0.8210828 , 0.8697561 , 0.8547278 , 0.85224617,
0.8521625 , 0.84694564, 0.8472206 , 0.8432255 , 0.8431826 ,
0.8394848 , 0.83804935, 0.83134645, 0.8234757 , 0.82382894,
0.82562804, 0.80014366, 0.7866942 , 0.78344023, 0.78955245,
0.7862923 , 0.7851586 , 0.7805863 , 0.780684 , 0.79073226,
0.79341674, 0.7970072 , 0.7966449 , 0.79455364, 0.7945448 ,
0.79476243, 0.7928985 , 0.79307675, 0.79677683, 0.79655904,
0.79619783, 0.7947823 , 0.7915144 , 0.7912799 , 0.795091 ,
0.8032384 , 0.810835 , 0.8084989 , 0.8094493 , 0.8045582 ,
0.80466574, 0.8074054 , 0.8075554 , 0.80178404, 0.7978776 ,
0.78742194, 0.8119776 , 0.8119776 , 0.8119776 , 0.8119776 ,
0.8119776 , 0.8119776 ]
The funda is that the text's background color should shades w.r.t. the neuron weights provided. (For positive weights green color for negative weights red color and some yellow when the weight value is near 0)
So for above the shading should be (Green for positive and red shade for negative)
But what it really plotting is
The function which is shading the text w.r.t. to neuron weights is
def plot_neuron_heatmap(text, values, n_limit=80, savename='fig1.png',
cell_height=0.325, cell_width=0.15, dpi=100):
text = text.replace('\n', '\\n')
text = np.array(list(text + ' ' * (-len(text) % n_limit)))
if len(values) > text.size:
values = np.array(values[:text.size])
else:
t = values
values = np.zeros(text.shape, dtype=np.int)
values[:len(t)] = t
text = text.reshape(-1, n_limit)
values = values.reshape(-1, n_limit)
mask = np.zeros(values.shape, dtype=np.bool)
mask.ravel()[values.size:] = True
mask = mask.reshape(-1, n_limit)
plt.figure(figsize=(cell_width * n_limit, cell_height * len(text)))
hmap = sns.heatmap(values, annot=text,mask=mask, fmt='', vmin=-5, vmax=5, cmap='RdYlGn', xticklabels=False, yticklabels=False, cbar=False)
plt.subplots_adjust()
plt.savefig(savename if savename else 'fig1.png', dpi=dpi)
Where I am wrong?
Above defintion refined by #Mad Physicist link
When you create your values array with np.zeros, you set dtype=np.int. So, even though you then replace the zeros with the actual floating-point data, they are being rounded to integers, because thats the dtype of the array. This is essentially setting them all to 0, since they are all less than 1.
You really want to keep them as floats, so if you instead change this line:
values = np.zeros(text.shape, dtype=np.int)
to
values = np.zeros(text.shape, dtype=np.float)
everything seems to work fine.

Numpy Array value setting issues

I have a data set that spans a certain length of time and data points for each of these time points. I would like to create a much more detailed timescale and fill the empty data points to zero. I wrote a piece of code to do this but it isn't doing what I want it to. I tried a sample case though and it seems to work. Below are the two codes.
This piece of code does not do what I want it to.
import numpy as np
TD_t = np.array([36000, 36500, 37000, 37500, 38000, 38500, 39000, 39500, 40000, 40500, 41000, 41500, 42000, 42500,
43000, 43500, 44000, 44500, 45000, 45500, 46000, 46500, 47000, 47500, 48000, 48500, 49000, 49500,
50000, 50500, 51000, 51500, 52000, 52500, 53000, 53500, 54000, 54500, 55000, 55500, 56000, 56500,
57000, 57500, 58000, 58500, 59000, 59500, 60000, 60500, 61000, 61500, 62000, 62500, 63000, 63500,
64000, 64500, 65000, 65500, 66000])
TD_d = np.array([-0.05466527, -0.04238242, -0.04477601, -0.02453717, -0.01662798, -0.02548617, -0.02339215,
-0.01186576, -0.0029057 , -0.01094671, -0.0095005 , -0.0190277 , -0.01215644, -0.01997112,
-0.01384497, -0.01610656, -0.01927564, -0.02119056, -0.011634 , -0.00544096, -0.00046568,
-0.0017769 , -0.0007341, 0.00193066, 0.01359107, 0.02054919, 0.01420335, 0.01550565,
0.0132394 , 0.01371563, 0.01959774, 0.0165316 , 0.01881992, 0.01554435, 0.01409003,
0.01898334, 0.02300266, 0.03045158, 0.02869013, 0.0238423 , 0.02902356, 0.02568908,
0.02954539, 0.02537967, 0.02927247, 0.02138605, 0.02815635, 0.02733237, 0.03321588,
0.03063803, 0.03783137, 0.04110955, 0.0451221 , 0.04646263, 0.04472884, 0.04935833,
0.03372911, 0.04031406, 0.04165237, 0.03940343, 0.03805504])
time = np.arange(0, 100001,1)
data = np.zeros_like(time)
for i in range(0, len(TD_t)):
t = TD_t[i]
data[t] = TD_d[i]
print(i,t,TD_d[i],data[t])
But for some reason this code works.
import numpy
nums = numpy.array([0,1,2,3])
data = numpy.zeros_like(nums)
data[0] = nums[2]
data[0], nums[2]
Any help will be much appreciated!!
It's because the dtype of data is being set to int64, and so when you try to reassign one of the data elements, it gets rounded to zero.
Try changing the line to:
data = np.zeros_like(time, dtype=float)
and it should work (or use whatever dtype the TD_d array is)

Random Forest feature importance: how many are actually used?

I use RF twice in a row.
First, I fit it using max_features='auto' and the whole dataset (109 feature), in order to perform features selection.
The following is RandomForestClassifier.feature_importances_, it correctly gives me 109 score per each feature:
[0.00118087, 0.01268531, 0.0017589 , 0.01614814, 0.01105567,
0.0146838 , 0.0187875 , 0.0190427 , 0.01429976, 0.01311706,
0.01702717, 0.00901344, 0.01044047, 0.00932331, 0.01211333,
0.01271825, 0.0095337 , 0.00985686, 0.00952823, 0.01165877,
0.00193286, 0.0012602 , 0.00208145, 0.00203459, 0.00229907,
0.00242616, 0.00051358, 0.00071606, 0.00975515, 0.00171034,
0.01134927, 0.00687018, 0.00987706, 0.01507474, 0.01223525,
0.01170495, 0.00928417, 0.01083082, 0.01302036, 0.01002457,
0.00894818, 0.00833564, 0.00930602, 0.01100774, 0.00818604,
0.00675784, 0.00740617, 0.00185461, 0.00119627, 0.00159034,
0.00154336, 0.00478926, 0.00200773, 0.00063574, 0.00065675,
0.01104192, 0.00246746, 0.01663812, 0.01041134, 0.01401842,
0.02038318, 0.0202834 , 0.01290935, 0.01476593, 0.0108275 ,
0.0118773 , 0.01050919, 0.0111477 , 0.00684507, 0.01170021,
0.01291888, 0.00963295, 0.01161876, 0.00756015, 0.00178329,
0.00065709, 0. , 0.00246064, 0.00217982, 0.00305187,
0.00061284, 0.00063431, 0.01963523, 0.00265208, 0.01543552,
0.0176546 , 0.01443356, 0.01834896, 0.01385694, 0.01320648,
0.00966011, 0.0148321 , 0.01574166, 0.0167107 , 0.00791634,
0.01121442, 0.02171706, 0.01855552, 0.0257449 , 0.02925843,
0.01789742, 0. , 0. , 0.00379275, 0.0024365 ,
0.00333905, 0.00238971, 0.00068355, 0.00075399]
Then, I transform the dataset over the previous fit which should reduce its dimensionality, and then i re-fit RF over it.
Given max_features='auto' and the 109 feats, I would expect to have in total ~10 features instead, calling rf.feats_importance_, returns more (62):
[ 0.01261971, 0.02003921, 0.00961297, 0.02505467, 0.02038449,
0.02353745, 0.01893777, 0.01932577, 0.01681398, 0.01464485,
0.01672119, 0.00748981, 0.01109461, 0.01116948, 0.0087081 ,
0.01056344, 0.00971319, 0.01532258, 0.0167348 , 0.01601214,
0.01522208, 0.01625487, 0.01653784, 0.01483562, 0.01602748,
0.01522369, 0.01581573, 0.01406688, 0.01269036, 0.00884105,
0.02538574, 0.00637611, 0.01928382, 0.02061512, 0.02566056,
0.02180902, 0.01537295, 0.01796305, 0.01171095, 0.01179759,
0.01371328, 0.00811729, 0.01060708, 0.015717 , 0.01067911,
0.01773623, 0.0169396 , 0.0226369 , 0.01547827, 0.01499467,
0.01356075, 0.01040735, 0.01360752, 0.01754145, 0.01446933,
0.01845195, 0.0190799 , 0.02608652, 0.02095663, 0.02939744,
0.01870901, 0.02512201]
Why? Shouldn't it returns just ~10 features importances?
You misunderstood the meaning of max_features, which is
The number of features to consider when looking for the best split
It is not the number of features when transforming the data.
It is the threshold in transform method that determines the most important features.
threshold : string, float or None, optional (default=None)
The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median” (resp. “mean”), then the threshold value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if available, the object attribute threshold is used. Otherwise, “mean” is used by default.

Resources