Tools for parsing natural language questions in realtime - nlp

photos in washington VS show me photos in washington VS I wanna see all my photos in washington taken day before yesterday
what:photos
entities:washington (dont want to be too assuming)
when: 2013-03-14
I want to parse preset queries into conditions (like above). I want these qualities:
I can extract relevant terms even in presence of fluff ("I wanna see) and lowercase nouns
warm program can accept requests over HTTP or allow me to add some network communication
warm program responds in 50ms and needs atmost 500Mb of memory for reasonable sentences
I am more experienced in Python, less so in Java
Parser data structure is easy to handle
I use NLTK, but its slow. I see StanfordNLP and OpenNLP as viable alternatives but I find the program-start latency to be too high. I dont mind integrating them over servlets if I am left with no alternative.

The Stanford Parser is a solid choice, and pretty well-supported (as research code goes). But it sounds like low latency is an important requirement for you, so I'd also suggest you look at the BUBS Parser (full disclosure - I'm one of the primary researchers working on BUBS).
I haven't compared directly to NLTK, but I think you may find that the Stanford Parser doesn't meet your performance needs. This paper found a total throughput of ~60 words/second (~2-3 sentences/second). Those timings are pretty old, so newer hardware will certainly improve that, but probably still won't come close to 50 ms latency.
As you note, startup time will be an issue with any parser - a high-accuracy model is necessarily quite large. And 500 MB is probably pretty tight too (I usually run BUBS with 1-1.2 GB). But once loaded, BUBS latency is generally in the neighborhood of 10 ms per sentence (for ~20-25-word sentences), and we can push the total throughput up around 2500 words/second before accuracy starts to drop off. I think those numbers might meet your performance needs, and I don't know of any other high-accuracy (F1 >= 88-89) parser that comes close in speed.
Note: the fastest results are with recent pruning models that aren't yet posted to the website, but I can get you a model if you need. Hope that helps, and if you have more questions, feel free to ask.

Related

Dactyl-ManuForm vs Kinesis Advantage 360

I am in the market for a split orthonormal ergonomic keyboard. I have an issue with my shoulder that gets worse when I type.
I bought an Ergodox EZ and while I thought it was a definite improvement I ended up returning it. After doing more research I learned that I should have gotten the modeled keycaps, but I got the backlit Ergodox EZ which doesn't support these keycaps and the molded keycaps on the Ergodox was a worse version of what the Dactyl-ManuForm and Kinesis Advantage 360 were delivering with their truly concaved form factors.
I am trying to determine which is the "best" ergonomic keyboard for me. I am leaning towards the Dactyl-ManuForm. But I don't have the time, means, or interest to build my own. I am looking at this option: https://taikohub.com/products/dactyl-manuform-keyboard-v2 but I am curious whether the 5 Keyed Thumb Cluster or the 6 Keyed Thumb Cluster is a better option. Also, I am currently in Los Angeles, does anyone know another option (around here or otherwise) for configuring and buying a prebuilt Dactyl-ManuForm?
It looks like the Kinesis Advantage 360 is another good option that delivers a similar form factor. Has anyone tried both and can they recommend one over the other?
Lastly, I really liked that the Ergodox EZ had the ORYX software which allowed you not only to configure the keyboard but also a series of training exercises. I am not a great touch typer but it is something I will need to get skilled with in order to properly use these keyboards (especially if I get a dactyl without markings on the keycaps, like the rebuilt one I linked). Is there an analogous software training tool for the Dactyl-ManuForm and/or Advantage 360?
Any insights or recommendations would be greatly appreciated.

Is SRM in Google Optimize (Bayesian Model) a thing

So checking for Sample Ratio Mismatch is good for data quality.
But in Google Optimize i can't influence the sample size or do something against it.
My problem is, out of 15 A/B Tests I only got 2 Experiment with no SRM.
(Used this tool https://www.lukasvermeer.nl/srm/microsite/)
In the other hand the bayesian model deals with things like different sample sizes and I dont need to worry about, but the opinions on this topic are different.
Is SRM really a problem in Google Optimize or can I ignore it?
SRM affects Bayesian experiments just as much as it affects Frequentist. SRM happens when you expect a certain traffic split, but end up with a different one. Google Optimize is a black box, so it's impossible to tell if the uneven sample sizes you are experiencing are intentional or not.
Lots of things can cause a SRM, for example if your variation's javascript code has a bug in some browsers those users may not be tracked properly. Another common cause is if your variation causes page load times to increase, more people will abandon the page and you'll see a smaller sample size than expected.
That lack of statistical rigor and transparency is one of the reasons I built Growth Book, which is an open source A/B testing platform with a Bayesian stats engine and automatic SRM checks for every experiment.

Google PageSpeed Insights: Is it really worth chasing top score?

Is it really worth chasing a 100% score on Google PageSpeed Insights at the sacrifice of the best User Experience?
Some of the opportunities suggested create a poorer experience, so is it really worth it?
Is it really worth chasing a 100% score on Google PageSpeed Insights at the sacrifice of the best User Experience?
NEVER sacrifice user experience in the pursuit of speed and performance.
Speed is important but how people interact with your site is far more important, especially if you are spending money driving people to your site.
With that being said your second point is completely false....
Some of the opportunities suggested create a poorer experience, so is it really worth it?
There is not one suggestion in PSI that will negatively affect user experience.
This shows that either you didn't understand a point or that you have implemented a fix incorrectly. (they are not that well explained to be fair)
If there are particular points you are struggling with then please post a separate question, I will happily guide you.
Speed has been proven again and again to improve conversions, with every second being worth around 10% in conversion rate improvement (or every 0.1 second being worth a 1% increase in conversions).
When to not chase 100% score on PSI
The actually reason to not chase perfection is time and resources, coupled with diminishing returns.
To get a site to 80 / 100 is normally an easy process.
To get from 80 to 90 takes a reasonable amount of work.
To get from 90 to 100 normally means designing the site from the ground up to be lightweight and involves all sorts of tricks such as inlining critical CSS.
Let time and effort guide you, there is no real benefit of trying to push for 100% unless you are turning over a significant amount of money (£1 million+) as the conversion rate increase is not proportional to time spent and cost.
On my site (https://klu.io) I had already designed it to be lightweight from the start and it still took me nearly a day to optimise all the SVGs, set up automatic CSS inlining for critical content etc. etc.
I will be the first to say that extra day effort was not worth it, other than to 'show off' to clients that I can get sites to 100, I never take a client site to that extreme as it does not provide any financial benefit and I would recommend the same to you, aim for 85 and above and you will have a high performance website fit for 99% of needs.

why do these sounds make a dog unhappy?

This is an odd one. I'm not sure it's the right place to ask, but maybe there are some sound experts who can chime in? (no pun intended)
We use two sounds on our website to indicate success and failure on a quiz. Those are very simple and short sounds.
Somehow one of our customers reported that her dog was whimpering and really upset with both of those sounds. He's normally fine with lots of other sounds that dogs are typically unhappy with including loud sounds, hoovers etc. She even said it happens when she uses headphones!
Other than muting, or replacing those sounds (and upsetting other dogs?), is there anything we can do to clean the sound or detect what specifically makes them upset this or other dogs?
Downvoters: I think this question crosses over between biology/physiology/physics and signal and audio processing. The answers I'm getting now actually demonstrate this. It requires this cross-domain knowledge. In any case, I'm happy to delete it if this seems to not jive well with this community. I think my intentions were positive and I added a bounty to try to solve this real problem. It saddens me to even see downvotes for the answers although they made an effort to help.
EDIT: I'm unable to delete this question it seems. I get an error message.
EDIT2: In case it's more helpful, here's a spectrum analysis of both sounds using Audacity. There are lots of different options, but this is using the default options for Analyze->Plot spectrum
This question is not at the right place (and down-voted) but for your information you may take a look on Frequency Range of Dog Hearing where you can read that:
Humans can hear sounds approximately within the frequencies of 20 Hz and 20,000 Hz
[...]
The frequency range of dog hearing is approximately 40 Hz to 60,000 Hz
Note also that:
The shape of a dog's ear also helps it hear more proficiently.
A similar example is :
A vacuum cleaner, which merely sounds loud to us, can produce a high frequency sound which may scare dogs away
I expect that very low frequency can also scare dog (like thunderstorm sound)
You may use a spectrum analyser software (open source audio-software like Audacity allow you to do the job) to double check if low/high frequency are present in the sound.
In my opinion you may use a Band-pass filter, to cut all frequency lower than 50KHz & higher than 15KHz to avoid the "the vacuum cleaner" and "thunderstorm sound" effect (which may scare dogs.)
You may finally take a look on the Audacity low pass filter manual to know how to apply this filter on your sound.
Despite the fact that the question is offtopic here, I'll move here my initial answer from comments - I guess this will allow the question author to close question bounty properly.
The answer:
1. I guess your question is downvoted because here's a place for computer software and hardware functioning related questions, and your question is rather from physics or biology domains. So ask them on appropriate SO sections: physics.stackexchange.com and/or biology.stackexchange.com
Regarding your question, I would recommend you to check your sounds frequency range: dogs do not like sounds with loud high frequencies. Perhaps your sounds contains high frequencies, you can check it yourself with sound spectrum diagram in some audio redactor, for example Audacity
And yes, as #astefani pointed out those frequencies can be removed with band-pass filter, for example with low-pass filter in Audacity

How large is the average delay from key-presses

I am currently helping someone with a reaction time experiment. For this experiment reaction times on the keyboard are measured. For this experiment it might be important to know, how much error could be introduced because of the delay between the key-press and the processing in the software.
Here are some factors that I found out using google already:
The USB-bus is polled at 125Hz at minimum and 1000Hz at maximum (depending on settings, see this link).
There might be some additional keyboard buffers in Windows that might delay the keypresses further, but I do not know about the logic behind those.
Unfortunately it is not possible to control the low level logic of the experiment. The experiment is written in E-Prime a software that is often used for this kind of experiments. However the company that offers E-Prime also offers additional hardware, that they advertise for precise reaction-timing. Hence they seem to be aware about this effect (but do not tell how large it is).
Unfortunately it is necessary to use a standart keyboard, so I need to provide ways to reduce the latency.
any latency from key presses can be attributed to the debounce routine (i usually use 30ms to be safe) and not to the processing algorithms themselves (unless you are only evaluating the first press).
If you are running an experiment where millisecond timing is important you may want to use http://www.blackboxtoolkit.com/ to find sources of error.
Your needs also depend on the nature of your study. I've run RT experiments in Eprime with a keyboard. Since any error should be consistent on average across participants, for some designs it is not a big problem. If you need to sync up the data though with something else (like Eye tracking or EEG) or want to draw conclusions about RT where specific magnitude is important then E-Primes serial resp box (or another brand, though I have had compatibility issues in the past with other brand boxes and eprime) is a must.

Resources