how to set dynamic time warping window adjustment?

how to set dynamic time warping window adjustment? - audio

I am using a DTW (dynamic time warping) code.
does anybody happen to know how should I set the size of adjustment window ? (global path constraint)

cross validation
get some labeled data, set the warping window to zero, measure the leave one out accuracy
Then keep increasing the warping window size until the accuracy gets worse.
See fig 5 and 6 of the paper below
eamonn
Ratanamahatana, C. A. and Keogh. E. (2004). Everything you know about Dynamic Time Warping is Wrong. Third Workshop on Mining Temporal and Sequential Data, in conjunction with the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), August 22-25, 2004 - Seattle, WA
http://www.cs.ucr.edu/~eamonn/DTW_myths.pdf

Related

How do I test for solar irradiation variation from 1000w/m2, 800w/m2, 400w/m2?

am new to the field of power electronics. I model maximum power point tracking perturb and observe algorithm with a buck-boost converter.
I want to test for various solar irradiance, please how do I go about this?
I want to test for various solar irradiation to be able to obtain the voltage, current and power, how can I do this ?
I also don't know if my c-script for the MPPT is right or wrong.
I have obtained the IV and PV curve for the PV module used.
It remains how I want to test for various solar irradiation from 1000w/m2, 800w/m2, 600w/m2 and 400w/m2
This is the link to my model which you can paste on browser and download. The model can only be opened with plecs software.
https://forum.plexim.com/?qa=blob&qa_blobid=7603544605438125951

Which solar geometry should be assumed by pvlib?

I try to calibrate my PV panel's efficiency by means of reference GHI, dHI values measured by our national weather service, which is 12 km away from my local PV site. At the reference site, I calibrate the Ineichen model by trimming its linke-turbidity parameter (LT) such that the measured and computed irradiances match equally. I assume that LT factor is valid within a radius 12km on a clear-sky day with a low aerosol optical depth. So I can port the calibrated LT to my location and height. At the same time, I measure my local PV's DC power (under MPPT condition) as a delta of (being exposed to global normal irradiance) minus (being exposed only to global diffused irradiance). My panel is small, so its shadowing from direct beams is pretty easy. Having computed DNI=(GHI-dHI)/cos(pi-zenith) by Ineichen at my site, I get the PV efficiency from the measured DC power at the given surface temperature (measured as well). So far, everything looks fine. But, I am getting different optimum LT parameters for the two reference matches GHI, dHI!
Is Ineichen model not enough exact for calibration purposes? No, the reason is elsewhere:
When using a single, compromised LT value, the both GHI, dHI values can be computed relatively equally greater (+3%) than their measured counterparts. This fact naturally raises a question, which extra-terrestrial irradiance value (GXI) is used by the numeric model? The reason of error stems in the Earth orbit's excentricity 0.017, which causes about 0.034 variance in GXI, well correlating with my observed "compromised" error. The authors' comment in pvlib confirms, that pvlib applies the circular solar geometry. According to my own experience, this one is far enough precise, when calculating the solar azimuth and zenith by the timestamp and position. A typical error of the computed solar angle is about 0.5%.
On the other hand, the high 3% error of extra-terrestrial irradiance could be easily fixed, if an accurate Sun-to-Earth distance was calculated by means of the ecliptic orbit model. This is even easier, then the calculation of solar angle by the circular model!
Currently, I use a following workaround: Trim LT so, that equal relative error is reached by the both model outputs GHI, dHI. Then port LT to the local site, and correct the computed DNI here by its (known) relative error.

which extra-terrestrial irradiance value (GXI) is used by the numeric model? ... The authors' comment in pvlib confirms, that pvlib applies the circular solar geometry.
I'm not sure what you mean by this. Can you provide a link to this comment? In any case, several models are available for extraterrestrial irradiance via pvlib.irradiance.get_extra_radiation (docs, v0.9.1 source code). Here's what the default model looks like across a year:
import pandas as pd
import pvlib
times = pd.date_range('2022-01-01', '2022-12-31', freq='d')
dni_et = pvlib.irradiance.get_extra_radiation(times)
dni_et.plot()
There are also functions for calculating earth-sun distance, e.g. pvlib.solarposition.nrel_earthsun_distance.

Impulse response analysis

I ran an impulse response analysis on a value weighted stock index and a few variables in python and got the following results:
I am not sure how to interpret these results.
Can anyone please help me out?

You might want to check the book "New introduction to Multiple Time Series Analysis" by Helmut Lutkepohl, 2005, for a slightly dense theory about the method.
In the meantime, a simple way you can interpret your plots is, let's say your variables are VW, SP500, oil, uts, prod, cpi, n3 and usd. They all are parts of the same system; what the impulse response analysis does is, try to assess how much one variable impacts another one independently of the other variables. Therefore, it is a pairwise shock from one variable to another. Your first plot is VW -> VW, this is pretty much an autocorrelation plot. Now, look at the other plots: apparently, SP500 exerts a maximum impact on VW (you can see a peak in the blue line reaching 0.25. The y-axis is given in standard deviations and x-axis in lag-periods. So in your example, SP500 cause a 0.25 change in VW at the lag of whatever is in your x-axis (I can't see from your figure). Similarly, you can see n3 negatively impacting VW at a given period.
There is an interesting link that you probably know and shows an example of the application of Python statsmodels VAR for Impulse Response analysis
I used this method to assess how one variable impact another in a plant-water-atmosphere system, there are some explanations there and also the interpretation of similar plots, take a look:
Use of remote sensing indicators to assess effects of drought and human-induced land degradation on ecosystem health in Northeastern Brazil
Good luck!

Scale before PCA

I'm using PCA from sckit-learn and I'm getting some results which I'm trying to interpret, so I ran into question - should I subtract the mean (or perform standardization) before using PCA, or is this somehow embedded into sklearn implementation?
Moreover, which of the two should I perform, if so, and why is this step needed?

I will try to explain it with an example. Suppose you have a dataset that includes a lot features about housing and your goal is to classify if a purchase is good or bad (a binary classification). The dataset includes some categorical variables (e.g. location of the house, condition, access to public transportation, etc.) and some float or integer numbers (e.g. market price, number of bedrooms etc). The first thing that you may do is to encode the categorical variables. For instance, if you have 100 locations in your dataset, the common way is to encode them from 0 to 99. You may even end up encoding these variables in one-hot encoding fashion (i.e. a column of 1 and 0 for each location) depending on the classifier that you are planning to use. Now if you use the price in million dollars, the price feature would have a much higher variance and thus higher standard deviation. Remember that we use square value of the difference from mean to calculate the variance. A bigger scale would create bigger values and square of a big value grow faster. But it does not mean that the price carry significantly more information compared to for instance location. In this example, however, PCA would give a very high weight to the price feature and perhaps the weights of categorical features would almost drop to 0. If you normalize your features, it provides a fair comparison between the explained variance in the dataset. So, it is good practice to normalize the mean and scale the features before using PCA.

Before PCA, you should,
Mean normalize (ALWAYS)
Scale the features (if required)
Note: Please remember that step 1 and 2 are not the same technically.

This is a really non-technical answer but my method is to try both and then see which one accounts for more variation on PC1 and PC2. However, if the attributes are on different scales (e.g. cm vs. feet vs. inch) then you should definitely scale to unit variance. In every case, you should center the data.
Here's the iris dataset w/ center and w/ center + scaling. In this case, centering lead to higher explained variance so I would go with that one. Got this from sklearn.datasets import load_iris data. Then again, PC1 has most of the weight on center so patterns I find in PC2 I wouldn't think are significant. On the other hand, on center | scaled the weight is split up between PC1 and PC2 so both axis should be considered.

Calculate expected color temperature of daylight

I have a location (latitude/longitude) and a timestamp (year/month/day/hour/minute).
Assuming clear skies, is there an algorithm to loosely estimate the color temperature of sunlight at that time and place?
If I know what the weather was at that time, is there a suggested way to modify the color temperature for the amount of cloud cover at that time?

I suggest taking a look at this paper which has nice practical implementation for CG applications:
A Practical Analytic Model for Daylight A. J. Preetham Peter Shirley Brian Smits
Abstract
Sunlight and skylight are rarely rendered correctly in computer
graphics. A major reason for this is high computational expense.
Another is that precise atmospheric data is rarely available. We
present an inexpensive analytic model that approximates full spectrum
daylight for various atmospheric conditions. These conditions are
parameterized using terms that users can either measure or estimate.
We also present an inexpensive analytic model that approximates the
effects of atmosphere (aerial perspective). These models are fielded
in a number of conditions and intermediate results verified against
standard literature from atmospheric science. Our goal is to achieve
as much accuracy as possible without sacrificing usability.
Both compressed postscript and pdf files of the paper are available.
Example code is available.
Color images from the paper are shown below.
Link only answers are discouraged but I can not post neither sufficient portion of the article nor any complete C++ code snippet here as both are way too big. Following the link you can find both right now.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string