vDSP and Discrete Cosine Transform - transform

I am new to FFT, DCT and the like. Recently I was looking into the documentation for the vDSP library from Apple and was unable to find a DCT implementation. I was wondering if anyone knows of a way to calculate the DCT using one of the FFT functions provided in vDSP?
A bit of context. I am building an image processing implementation from a graphics paper I've been reading. They mentioned using DCT after performing some image sampling.
I'd really like to take advantage of the vDSP speed if at all possible.

OK, turns out my results were off because I was using floats in stead of doubles....
So I managed to do 1D DCT on iOS on the vDSP. I'm not sure if it's the right way, but it gives me the same results as jtransform, which is good enough for me :).
And I know this question was old, but it might just help someone else searching for this.
https://gist.github.com/2140473

Related

create 3d model of an equipment from 2d images

GOAL: I have to create a 3d model of a machine part. I have about 25 images of the same thing taken from different angles.
Progress: I am able to extract the coordinates for a label that is on the machine for most of the images.
Problem: but I have no idea how to proceed. I have read a bit about aero-triangulation, but I couldn't figure out how to implement it. I would really appreciate it, if you could guide me in the right direction.
It would be really helpful, if you could provide your solutions using python and opencv.
Edit: sorry but I cannot upload the code for this one as it is confidential. don't blame me please I am just an intern. Although I can tell that I cropped a template of the label from an image and then used Sift to match that template on all the images to get the coordinates of the label.
If you want to implement things yourself with OpenCV, I would command looking at SIFT (or SURF) features, RANSAC and the epipolar constraint. I believe the OpenCV cookbook describe those. Warning: math involved. And I don't know how to do dense mapping in OpenCV.
I know the GUI program "VisualSFM" that can automatically recreate 3D model from images. It uses SFM and other command line utilities behind the scenes. Since everything is opensource, you could create a python wrapper around the actual libraries (I found https://github.com/mapillary/OpenSfM asking Google). VisualSFM prints the command it calls, so a hacky way could be to call the same commands from python.
If it is a simple shape and you don't want to automate it, it could be faster to model it yourself (and the result could look better). In 1.5 week I managed to learn the basics of blender and to model a guitar necklace: https://youtu.be/BCGKsh51TNA . And I would now be able to do it in less than 1h. How long are you ready to invest to find a solution with OpenCV?

Canny algorithm is enough for creating a feature descriptor image and giving for SVM?

i retrieve contours from images by using canny algorithm. it's enough to have a descriptor image and put in SVM and find similarities? Or i need necessarily other features like elongation, perimeter, area ?
I talk about this, because inspired by this example: http://scikit-learn.org/dev/auto_examples/plot_digits_classification.html i give my image in greyscale first, in canny algorithm style second and in both cases my confusion matrix was plenty of 0 like precision, recall, f1-score, support measure
My advice is:
unless you have a low number of images in your database and/or the recognition is going to be really specific (not a random thing for example) I would highly recommend you to apply one or more features extractors such SIFT, Fourier Descriptors, Haralick's Features, Hough Transform to extract more details which could be summarised in a short vector.
Then you could apply SVM after all this in order to get more accuracy.

Support Vector Machine Illustration

Does anyone can give an example of a SVM? Especially how to get the w and b from a training set?
I tried to search in the internet but it only gives me a large amount of abstract mathematics.
As I am not good at it, so could anyone give me an illustration of a SVM with an example in very details?
Thank you so much.
This diagram on wikipedia provides a good example of what the goal is, but in truth a support vector machine is a lot of complicated math. You find the values for w and b by optimizing a quadratic programming system, and when hidden behind vector mathematics, it's not entirely clear what's going on unless you're well-tuned to the math.

Obstacle avoidance using 2 fixed cameras on a robot

I will be start working on a robotics project which involves a mobile robot that has mounted 2 cameras (1.3 MP) fixed at a distance of 0.5m in between.I also have a few ultrasonic sensors, but they have only a 10 metter range and my enviroment is rather large (as an example, take a large warehouse with many pillars, boxes, walls .etc) .My main task is to identify obstacles and also find a roughly "best" route that the robot must take in order to navigate in a "rough" enviroment (the ground floor is not smooth at all). All the image processing is not made on the robot, but on a computer with NVIDIA GT425 2Gb Ram.
My questions are :
Should I mount the cameras on a rotative suport, so that they take pictures on a wider angle?
It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
If a roughly accurate 3D representation of the enviroment can be made, how can it be used as creating a map of the enviroment? (Consider the following example: the robot must sweep an fairly large area and it would be energy efficient if it would not go through the same place (or course) twice;however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction )
I have found this response on a similar question , but I am still concerned with the accuracy of 3D reconstruction (for example a couple of boxes situated at 100m considering the small resolution and distance between the cameras).
I am just starting gathering information for this project, so if you haved worked on something similar please give me some guidelines (and some links:D) on how should I approach this specific task.
Thanks in advance,
Tamash
If you want to do obstacle avoidance, it is probably easiest to use the ultrasonic sensors. If the robot is moving at speeds suitable for a human environment then their range of 10m gives you ample time to stop the robot. Keep in mind that no system will guarantee that you don't accidentally hit something.
(2) It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
Yes, this is possible. Have a look at ROS and their vSLAM. http://www.ros.org/wiki/vslam and http://www.ros.org/wiki/slam_gmapping would be two of many possible resources.
however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction
Well, you are trying to find your position given a measurement and a map. That should be possible, and it wouldn't matter from which direction the map was created. However, there is the loop closure problem. Because you are creating a 3D map at the same time as you are trying to find your way around, you don't know whether you are at a new place or at a place you have seen before.
CONCLUSION
This is a difficult task!
Actually, it's more than one. First you have simple obstacle avoidance (i.e. Don't drive into things.). Then you want to do simultaneous localisation and mapping (SLAM, read Wikipedia on that) and finally you want to do path planning (i.e. sweeping the floor without covering area twice).
I hope that helps?
I'd say no if you mean each eye rotating independently. You won't get the accuracy you need to do the stereo correspondence and make calibration a nightmare. But if you want the whole "head" of the robot to pivot, then that may be doable. But you should have some good encoders on the joints.
If you use ROS, there are some tools which help you turn the two stereo images into a 3d point cloud. http://www.ros.org/wiki/stereo_image_proc. There is a tradeoff between your baseline (the distance between the cameras) and your resolution at different ranges. large baseline = greater resolution at large distances, but it also has a large minimum distance. I don't think i would expect more than a few centimeters of accuracy from a static stereo rig. and this accuracy only gets worse when you compound there robot's location uncertainty.
2.5. for mapping and obstacle avoidance the first thing i would try to do is segment out the ground plane. the ground plane goes to mapping, and everything above is an obstacle. check out PCL for some point cloud operating functions: http://pointclouds.org/
if you can't simply put a planar laser on the robot like a SICK or Hokuyo, then i might try to convert the 3d point cloud into a pseudo-laser-scan then use some off the shelf SLAM instead of trying to do visual slam. i think you'll have better results.
Other thoughts:
now that the Microsoft Kinect has been released, it is usually easier (and cheaper) to simply use that to get a 3d point cloud instead of doing actual stereo.
This project sounds a lot like the DARPA LAGR program. (learning applied to ground robots). That program is over, but you may be able to track down papers published from it.

Can I use NLTK to determine if a comment is a positive one or a negative one?

Can you show me a simple example using http://www.nltk.org/code to determine if a string about a happy or upset mood?
NLTK cannot out of the box, but if you are looking for some related research on that area, take a look at this paper on Offensive Language Detection. The same methods could be adapted to detect comments which are not offensive/unoffensive, but instead happy/unhappy. The primary software package being used in this project for text classification is called WEKA and uses multiple classifiers, trained on previous examples, to determine whether language is offensive or not (and in this method uses a tunable threshold).
Pattern is something worthwhile a test drive too: you can see two opinion mining experiments right on the project homepage.
http://www.clips.ua.ac.be/pages/pattern-examples-100days
http://www.clips.ua.ac.be/pages/pattern-examples-elections
Nopey.
This is a task far beyond the capabilities of NLTK or any grammatical parser that is known or can be realistically imagined. Look at the NLTK Book to see what sorts of tasks it can accomplish which are far, far from your stated purpose.
As a cheap example:
I really enjoyed using your paper to train my dog.
Parse that up with NLTK and you can get
[('I', 'PRP'), ('really', 'RB'), ('enjoyed', 'VBD'),
('using', 'VBG'), ('your', 'PRP$'), ('paper', 'NN'),
('to', 'TO'), ('train', 'VB'), ('my', 'PRP$'), ('dog', 'NN')]
Where the parse tree would tell me that 'enjoyed' is the central (past-tense) verb of the simple sentence. To enjoy something is good. To train something is generally a good thing. Gerunds, nouns, comparatives, and such are relatively neutral. So give this a Good score of 0.90.
Except I really mean that I either hit my dog with your paper or let it excrete on the paper which you'd probably consider a not Good thing.
Hire a person for this recognition task.
Added for those who imagine that even trained classifiers are of much use:
Classify this real entry from a real customer review corpus using any classifier you like trained on any dataset you like:
This camera keeps on autofocussing in
auto mode with a buzzing sound which
can't be stopped. It would be really
good if they have given an option to
stop this autofocussing. If you want
to have the date and time on the
image, it's only through their
software which reads the image's date
and time from the image's meta-data.
So if you use your card reader and
copy images - you got to once again
open them through their software to
put the date and time. In that too,
there isn't a direct way to add date
and time
- you got to say 'print images' to a different directory in which there is
an option to specify the date and time
. Even the slightest of the shakes
totally distorts your image. Indoor
images weren't so clear. You got to
have flash 'on' to get it even though
your room is well lit. The lens cap is
a really annoying. the movie clips
taken will always have some 'noise' in
it - you can't avoid that.
The worst mood classification I obtained was "totally equivocal" yet humans can easily determine that this is anything but complimentary. This wasn't a randomly picked datum, rather one that was selected for negative bias without "hate" or "suxz" or similar.
You're looking for a technique that uses a machine learning classifier to determine whether a piece of text is positive or negative. There have been various different attempts at this by a number of research teams (e.g. http://research.yahoo.com/pub/2387 and http://lingcog.iit.edu/doc/appraisal_sentiment_cikm.pdf) we can get about 80% to 90% accuracy at determining whether a product review is positive or negative.
Due to the brevity of your question, it's not obvious to me whether determining whether a product review is positive or negative is the same task you're trying to accomplish, or merely a related task, but I'd suggest starting simple with bag-of-words classification with a Bayesian classifier (which NLTK should be able to handle), and then improve your techniques from there depending on how the accuracy turns out.
Unfortunately, I've never used NLTK (nor Python for that matter) so I can't give you a code example of how to use NLTK for this.

Resources