I'm doing a project, where I'm reading accelerometer sensor data from a smartwatch. The app that is made has a start and stop button, that starts and stops the sensor readings, respectively.
I would like to start recording the data when the user presses start - I know how to access the sensor data, but I need a way to store the data until the stop button is pressed. Accelerometer data is X-, Y-, and Z-axis data, that could look something like this:
[2.034343, 8.342423, 0.012313]
So I need a way to store this type of vector within an additional array, in order to separate each vector recorded. Thus, the data structure I'm looking for should look something like this when having recorded three vectors of data:
[[2.034343, 8.342423, 0.012313], [6.031843, 2.349153, 0.012313], [1.734843, 5.342423, 1.012393]]
The data structure will be filled with these vectors in the time between the user pressing start to when stop is pressed.
Do I need an ArrayList of ArrayLists? Or do you have any other suggestions as to what the best data structural solution to this is? As of right now, the data only needs to be stored as long as the app is running.
I have no idea why I cannot figure out how to do this in Kotlin, I have only been able to make an ArrayList of ArrayList, however, instead of separating each vector with the brackets (as shown above) it just appends all the vector values to an array that holds all values (like a standard array).
Hope I can get some help. Thanks in advance.
A data class could make sense:
data class Point(val x: Double, val y: Double, val z: Double)
val list = mutableListOf<Point>()
list.add(Point(2.034343, 8.342423, 0.012313))
list.add(Point(6.031843, 2.349153, 0.012313))
list.add(Point(1.734843, 5.342423, 1.012393))
for (point in list) {
println("Point: " + point.x + " | " + point.y + " | " + point.z)
If you want to do this with a 2D structure, you can instantiate an ArrayList or MutableList, and then you just add lists to it each time you have new data. If you use ArrayList specifically, you can specify the initial size. Setting a large initial size can help avoid the list having to resize itself while you're adding data, which can be an expensive operation if it has become large.
val data = ArrayList<List<Float>>(10000)
//adding data:
data.add(listOf(xData, yData, zData))
However, I would probably use a 1D collection because all the data points are Floats (or Doubles?) and it's easy to read them 3 at a time. Then you aren't having to create a new collection object to wrap each data point, which requires more memory allocation.
Normally, I wouldn't do something like this, which sacrifices code clarity and conciseness, but since you're recording a stream of data that could be very large, it makes sense to optimize it a bit up front.
val data = ArrayList<Float>(30000)
//adding data:
//iterating the data later:
for (i in 0 until data.size step 3) {
val x = data[i]
val y = data[i + 1]
val z = data[i + 2]
// Or if you have a Point class, you could create a wrapper class
// for getting the points lazily:
class PointAccessor(val source: List<Float>) {
operator fun get(index: Int): Point {
val i = index * 3
return Point(source[i], source[i + 1], source[i + 2])
val pointData = PointAccessor(data)
val someX = pointData[4].x
Let Q be a distributed Row Matrix in Spark, I want to calculate the cross product of Q with its transpose Q'.
However although a Row Matrix does have a multiply() method, but it can only accept local Matrices as an argument.
Code illustration ( Scala ):
val phi = new RowMatrix(phiRDD) // phiRDD is an instance of RDD[Vector]
val phiTranspose = transposeRowMatrix(phi) // transposeRowMatrix()
// returns the transpose of a RowMatrix
val crossMat = ? // phi * phiTranspose
Note that I want to perform the dot product of 2 Distributed RowMatrix not a distributed one with a local one.
One solution is to use an IndexedRowMatrix as following:
val phi = new IndexedRowMatrix(phiRDD) // phiRDD is an instance of RDD[IndexedRow]
val phiTranspose = transposeMatrix(phi) // transposeMatrix()
// returns the transpose of a Matrix
val crossMat = phi.toBlockMatrix().multiply( phiTranspose.toBlockMatrix()
However, I want to use the Row Matrix-Methods such as tallSkinnyQR() and this means that I sholud transform crossMat to a RowMatrix, using .toRowMatrix() method:
val crossRowMat = crossMat.toRowMatrix()
and finally I can apply
but this process includes many transformations between the types of the Distributed Matrices and according to what I understood from MLlib Programming Guide this is expensive:
It is very important to choose the right format to store large and distributed matrices. Converting a distributed matrix to a different format may require a global shuffle, which is quite expensive.
Would someone elaborate, please.
Only distributed matrices which support matrix - matrix multiplication are BlockMatrices. You have to convert your data accordingly - artificial indices are good enough:
new IndexedRowMatrix(
rowMatrix.rows.zipWithIndex.map(x => IndexedRow(x._2, x._1))
).toBlockMatrix match { case m => m.multiply(m.transpose) }
I used the algorithm listed on this page which moves the multiplication problem from dot product to distributed scalar product problem by using vectors outer product:
The outer product between two vectors is the scalar product of the
second vector with all the elements in the first vector, resulting in
a matrix
My own created multiplication function (can be more optimized) for Row Matrices ended up like that.
def multiplyRowMatrices(m1: RowMatrix, m2: RowMatrix)(implicit ctx: SparkSession): RowMatrix = {
// Zip m1 columns with m2 rows
val m1Cm2R = transposeRowMatrix(m1).rows.zip(m2.rows)
// Apply scalar product between each entry in m1 vector with m2 row
val scalar = m1Cm2R.map{
case(column:DenseVector,row:DenseVector) => column.toArray.map{
columnValue => row.toArray.map{
rowValue => columnValue*rowValue
// Add all the resulting matrices point wisely
val sum = scalar.reduce{
case(matrix1,matrix2) => matrix1.zip(matrix2).map{
case(array1,array2)=> array1.zip(array2).map{
case(value1,value2)=> value1+value2
new RowMatrix(ctx.sparkContext.parallelize(sum.map(array=> Vectors.dense(array))))
After that I tested both approaches- My own function and using block matrix - using a 300*10 Matrix on a one machine
Using my own function:
val PhiMat = new RowMatrix(phi)
val TphiMat = transposeRowMatrix(PhiMat)
val product = multiplyRowMatrices(PhiMat,TphiMat)
Using matrix transformation:
val MatRow = new RowMatrix(phi)
val MatBlock = new IndexedRowMatrix(MatRow.rows.zipWithIndex.map(x => IndexedRow(x._2, x._1))).toBlockMatrix()
val TMatBlock = MatBlock.transpose
val productMatBlock = MatBlock.multiply(TMatBlock)
val productMatRow = productMatBlock.toIndexedRowMatrix().toRowMatrix()
The first approach spanned 1 job with 5 stages and took 2s to finish in total. While the second approach spanned 4 jobs, three with one stage and one with two stages, and took 0.323s in total. Also the second approach outperformed the first with respect to the Shuffle Read/Write size.
Yet I am still confused by the MLlib Programming Guide statement:
It is very important to choose the right format to store large and
distributed matrices. Converting a distributed matrix to a different
format may require a global shuffle, which is quite expensive.
Please help my poor knowledge of signal processing.
I want to smoothen some data. Here is my code:
import numpy as np
from scipy.signal import butter, filtfilt
def testButterworth(nyf, x, y):
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
if __name__ == '__main__':
positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fl = testButterworth(nyf, x, y)
I am pretty satisfied with results except one point:
it is absolutely crucial to me that the start and end point in returned values equal to the start and end point of input. How can I introduce this restriction?
UPD 15-Dec-14 12:04:
my original data looks like this
Applying the filter and zooming into last part of the graph gives following result:
So, at the moment I just care about the last point that must be equal to original point. I try to append copy of data to the end of original list this way:
the result is as expected even worse.
Then I try to append data this way:
And the slice where one period ends and next one begins, looks like that:
To do this, you're always going to cheat somehow, since the true filter applied to the true data doesn't behave the way you require.
One of the best ways to cheat with your data is to assume it's periodic. This has the advantages that: 1) it's consistent with the data you actually have and all your changing is to append data to the region you don't know about (so assuming it's periodic as as reasonable as anything else -- although may violate some unstated or implicit assumptions); 2) the result will be consistent with your filter.
You can usually get by with this by appending copies of your data to the beginning and end of your real data, or just small pieces, depending on your filter.
Since the FFT assumes that the data is periodic anyway, that's often a quick and easy approach, and is fully accurate (whereas concatenating the data is an estimation of an infinitely periodic waveform). Here's an example of the FFT approach for a step filter.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 128)
y = (np.sin(.22*(x+10))>0).astype(np.float)
# filter
y2 = np.fft.fft(y)
f0 = np.fft.fftfreq(len(x))
y2[(f0<-.25) | (f0>.25)] = 0
y3 = abs(np.fft.ifft(y2))
plt.plot(x, y)
plt.plot(x, y3)
plt.xlim(-10, 140)
plt.ylim(-.1, 1.1)
Note how the end points bend towards each other at either end, even though this is not consistent with the periodicity of the waveform (since the segments at either end are very truncated). This can also be seen by adjusting waveform so that the ends are the same (here I used x+30 instead of x+10, and here the ends don't need to bend to match-up so they stay at level with the end of the data.
Note, also, to have the endpoints actually be exactly equal you would have to extend this plot by one point (at either end), since it periodic with exactly the wavelength of the original waveform. Doing this is not ad hoc though, and the result will be entirely consistent with your analysis, but just representing one extra point of what was assumed to be infinite repeats all along.
Finally, this FFT trick works best with waveforms of length 2n. Other lengths may be zero padded in the FFT. In this case, just doing concatenations to either end as I mentioned at first might be the best way to go.
The question is how to filter data and require that the left endpoint of the filtered result matches the left endpoint of the data, and same for the right endpoint. (That is, in general, the filtered result should be close to most of the data points, but not necessarily exactly match any of them, but what if you need a match at both endpoints?)
To make the filtered result exactly match the endpoints of a curve, one could add a padding of points at either end of the curve and adjust the y-position of this padding so that the endpoints of the valid part of the filter exactly matched the end points of the original data (without the padding).
In general, this can be done by either iterating towards a solution, adjusting the padding y-position until the ends line up, or by calculating a few values and then interpolating to determine the y-positions that would be required for the matched endpoints. I'll do the second approach.
Here's the code I used, where I simulated the data as a sine wave with two flat pieces on either side (note, that these flat pieces are not the padding, but I'm just trying to make data that looks a bit like the OPs).
import numpy as np
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
#### op's code
def testButterworth(nyf, x, y):
#b, a = butter(4, 1.5/nyf)
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
def do_fit(data):
positions_recorded = data
#positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fx = testButterworth(nyf, x, y)
return fx
### simulate some data (op should have done this too!)
def sim_data():
t = np.linspace(.1*np.pi, (2.-.1)*np.pi, 100)
y = np.sin(t)
c = np.ones(10, dtype=np.float)
z = np.concatenate((c*y[0], y, c*y[-1]))
return z
### code to find the required offset padding
def fit_with_pads(v, data, n=1):
c = np.ones(n, dtype=np.float)
z = np.concatenate((c*v[0], data, c*v[1]))
fx = do_fit(z)
return fx
def get_errors(data, fx):
n = (len(fx)-len(data))//2
return np.array((fx[n]-data[0], fx[-n]-data[-1]))
def vary_padding(data, span=.005, n=100):
errors = np.zeros((4, n)) # Lpad, Rpad, Lerror, Rerror
offsets = np.linspace(-span, span, n)
for i in range(n):
vL, vR = data[0]+offsets[i], data[-1]+offsets[i]
fx = fit_with_pads((vL, vR), data, n=1)
errs = get_errors(data, fx)
errors[:,i] = np.array((vL, vR, errs[0], errs[1]))
return errors
if __name__ == '__main__':
data = sim_data()
fx = do_fit(data)
errors = vary_padding(data)
plt.plot(errors[0], errors[2], 'x-')
plt.plot(errors[1], errors[3], 'o-')
oR = -0.30958
oL = 0.30887
fp = fit_with_pads((oL, oR), data, n=1)[1:-1]
plt.plot(data, 'b')
plt.plot(fx, 'g')
plt.plot(fp, 'r')
Here, for the padding I only used a single point on either side (n=1). Then I calculate the error for a range of values shifting the padding up and down from the first and last data points.
For the plots:
First I plot the offset vs error (between the fit and the desired data value). To find the offset to use, I just zoomed in on the two lines to find the x-value of the y zero crossing, but to do this more accurately, one could calculate the zero crossing from this data:
Here's the plot of the original "data", the fit (green) and the adjusted fit (red):
and zoomed in the RHS:
The important point here is that the red (adjusted fit) and blue (original data) endpoints match, even though the pure fit doesn't.
Is this a valid approach? Of the various options, this seems the most reasonable since one isn't usually making any claims about the data that isn't being shown, and also for show region has an accurately applied filter. For example, FFTs usually assume the data is zero or periodic beyond the boundaries. Certainly, though, to be precise one should explain what was done.
I tried searching but came up short on my particular problem. I should mention that I am fairly new to MATLAB, so this may be something obvious but has slipped over my head.
I have an Excel file with accelerometer recordings of 5 events with some space inbetween. These events take place at times (ie rows) I have to specify, such as 120:250, 280:390, 430:943, and so on and so forth.
What I would like to do is to be able to loop through and extract the required data, and have it stored in variables such that each event will have it's own 'section' if you will, and each 'section' would contain it's own set of 'sub-sections' with the X, Y, Z accelerometer data.
My current set up is a manual one, and it looks like this:
X1 = xlsread('location.xlsx','sheet1','d110:d367');
X2 = xlsread('location.xlsx','sheet1','d367:d631');
X3 = xlsread('location.xlsx','sheet1','d631:d891');
X4 = xlsread('location.xlsx','sheet1','d891:d1134');
X5 = xlsread('location.xlsx','sheet1','d1134:d1361');
Y1 = xlsread('location.xlsx','sheet1','e110:e367');
Y2 = xlsread('location.xlsx','sheet1','e367:e631');
Y3 = xlsread('location.xlsx','sheet1','e631:e891');
Y4 = xlsread('location.xlsx','sheet1','E891:e1134');
Y5 = xlsread('location.xlsx','sheet1','e1134:e1361');
Z1 = xlsread('location.xlsx','sheet1','f110:f367');
Z2 = xlsread('location.xlsx','sheet1','f367:f631');
Z3 = xlsread('location.xlsx','sheet1','f631:f891');
Z4 = xlsread('location.xlsx','sheet1','f891:f1134');
Z5 = xlsread('location.xlsx','sheet1','f1134:f1361');
So you can see how it is not favorable. The other thing I'd like to do is to eventually use loops for cross correlation against other data sets, but again I'm not sure of the nature of the loops when dealing with 'dynamic' variables or what have you.
Right now the way I am thinking of doing it is that I specify the blocks of rows in a vector or something like that, and loop through for each activity, and then each axis.
Running XLSREAD for every variable won;t be optiomal for performance. This function uses COM interface (at least under Windows) and slow. If the data is not very big and can fit into the memory, it's better to read the whole sheet at once into a temporary variable and then sort the values into variables.
Another advice is not to use X1, X2 etc. You will have problems if you want to use those variable in a loop. If they have different length create a cell array, so they will become X{1}, X{2}, etc.
So, first read the whole file:
data = xlsread('location.xlsx','sheet1','D:F');
If you data all numeric you will get them in data matrix.
The index you can enter manually or get it from the data.
index = {120:250, 280:390, 430:943};
for ii = 1:numel(index)
X{ii} = data(index{ii},1);
Y{ii} = data(index{ii},2);
Z{ii} = data(index{ii},3);
When you take centiles of a variable in Stata, for eg.
*set directory
cd"C:\Etc\Etc Etc\"
*open data file
use "dataset.dta",clear
*get centiles
centile var1, centile(1,5(5)95,99)
is there some way to record the resulting centile table to excel? The centile values are stored in r(c_#), where # indicates the centile at which you want the data. But I need a vector of the values at all the centiles, more or less as it appears in the output window.
I have attempted to use foreach loop to get the centiles into a vector, as follows:
*Create column of centiles
foreach i in r(centiles) {
without success.
I've since found this to work:
matrix X = 0,0
forvalues i=1/21 {
matrix X = `i',round(r(c_`i'),.001)\ X
Only inconveniences are 1) I have to include a a first row of 0,0 in the output, which I will then subsequently drop. 2) In this case I have 21 centiles, but it would be nice to automate the number of centiles in case I want to change it, for example something like this:
forvalues i=1/r(n_cent) {
matrix X = `i',round(r(c_`i'),.001)\ X
But the "i=1/r(n_cent)" is invalid syntax. Any advice as to how I might overcome these two inconveniences would be much appreciated.
You can use the following syntax.
Load some data and compute the percentiles.
sysuse auto, clear
centile price, centile(1,5(5)95,99)
The matrix that is supposed to contain the results has to be initialized. This matrix is called X. It has as many rows as there are centiles requested via the centile command. It has two columns. At this stage, the matrix is populated with zeroes.
matrix X = J(`=wordcount("`r(centiles)'")', 2, 0)
The following loop is stepping through the results of the centile command and is replacing the zeroes in matrix X with the appropriate results. The first column of the matrix contains the number of the centile (1, 5, 10, ...) and the second column contains the result
forvalues i = 1 / `=wordcount("`r(centiles)'")' {
local cent: word `i' of `r(centiles)'
matrix X[`i', 1] = `cent'
matrix X[`i', 2] = r(c_`i')
Print the results:
matrix list X
If you are using round(), you are likely doing something wrong. There are few reasons to deliberately lose precision in the data; you can always display as many digits as you like using format this way or another (either applied to the data, or as an option of list or matrix list).
I wrote epctile command that returns percentiles as an estimation command, i.e., in the e(b) vector. This can be usable immediately; findit epctile to download.
You can modify your proposal as follows:
local thenumlist 1, 5(5)95, 99
centile variable, centile(`thenumlist')
forvalues i=1/`=r(n_cent)' {
matrix X = nullmat(X) \ r(c_`i')
numlist "`thenumlist'"
matrix rownames X = `r(numlist)'
matrix list X, format(%9.3f)
I have created a simple Chebyshev low pass filter based on coefficients generated by this site: http://www-users.cs.york.ac.uk/~fisher/mkfilter/, which I am using to filter out frequencies above 4kHz in an 16kHz sample rate audio signal before downsampling to 8kHz. Here's my code (which is C#, but this question is not C# specific, feel free to use other languages in different languages).
/// <summary>
/// Chebyshev, lowpass, -0.5dB ripple, order 4, 16kHz sample rte, 4kHz cutoff
/// </summary>
class ChebyshevLpf4Pole
const int NZEROS = 4;
const int NPOLES = 4;
const float GAIN = 1.403178626e+01f;
private float[] xv = new float[NZEROS+1];
private float[] yv = new float[NPOLES + 1];
public float Filter(float inValue)
xv[0] = xv[1]; xv[1] = xv[2]; xv[2] = xv[3]; xv[3] = xv[4];
xv[4] = inValue / GAIN;
yv[0] = yv[1]; yv[1] = yv[2]; yv[2] = yv[3]; yv[3] = yv[4];
yv[4] = (xv[0] + xv[4]) + 4 * (xv[1] + xv[3]) + 6 * xv[2]
+ (-0.1641503452f * yv[0]) + (0.4023376691f * yv[1])
+ (-0.9100943707f * yv[2]) + (0.5316388226f * yv[3]);
return yv[4];
To test it I created a sine wave "chirp" from 20Hz to 8kHz using Audacity. The test signal looks like this:
After filtering it I get:
The waveform shows that the filter is indeed reducing the amplitude of frequencies above 4kHz, but I have a load of noise added to my signal. This seems to be the case whichever of the filter types I try to implement (e.g. Butterworth, Raised Cosine etc).
Am I doing something wrong, or do these filters simply introduce artefacts at other frequencies? If I downsample using the naive approach of averaging every pair of samples, I don't get this noise at all (but obviously the aliasing is much worse).
OK, it was me being really stupid. The creation of my LPF was happening inside a processing loop instead of outside, meaning that every 512 samples I was creating a new one meaning I was losing the saved state. With just one instance of my filter running over the whole file, the noise goes away, and as expected I get aliased frequencies as the filter cannot completely remove everything above the cutoff.
I checked your filter-code in Mathematica and it works fine here without introducing noise, so probably the noise comes from some other part of your code.
It's possible that you have numerical stability problems, particularly if any of the poles are close to the unit circle. Try making all your intermediate terms double precision and then cast back to single precision at the end. I'm not too familiar with C# but in C this would be:
yv[4] = (float)(((double)xv[0] + (double)xv[4]) + 4.0 * ((double)xv[1] + (double)xv[3]) + 6.0 * xv[2]
+ (-0.1641503452 * (double)yv[0]) + (0.4023376691 * (double)yv[1])
+ (-0.9100943707 * (double)yv[2]) + (0.5316388226 * (double)yv[3]));
You haven't properly initialized your xv and yv arrays before using them for the first time. In most languages this means their values are undefined which may lead to unexpected results like yours. Initializing them to a proper value (like 0) may solve your issue.