For a university project I have to recognize characters from a license plate. I have to do this using python 3. I am not allowed to use OCR functions or use functions that use deep learning or neural networks. I have reached the point where I am able to segment the characters from a license plate and transform them to a uniform format. A few examples of segmented characters are here.
The format of the segmented characters is very dependent on the input. However, I can easily convert this to uniform dimensions using opencv. Additionally, I have a set of template characters and numbers that I can use to predict what character / number it is.
I therefore need a metric to express the similarity between the segmented character and the reference image. In this way, I can say that the reference image with the highest similarity score matches the segmented character. I have tried the following ways to compute the similarity.
For these operations I have made sure that the reference characters and the segmented characters have the same dimensions.
A bitwise XOR-operator
Inverting the reference characters and comparing them pixel by pixel. If a pixel matches increment the similarity score, if a pixel does not match decrement the similarity score.
hash both the segmented character and the reference character using 'imagehash'. Consequently comparing the hashes and see which ones are most similar.
None of these methods succeed to give me an accurate prediction for all characters. Most characters are usually correctly predicted. However, the program confuses characters like 8-B, D-0, 7-Z, P-R consistently.
Does anybody have an idea how to predict the segmented characters? I.e. defining a better similarity score.
Edit: Unfortunately, cv2.matchTemplate and cv2.matchShapes are not allowed for this assignment...
The general procedure for comparing two images consists in the extraction of features from the two images and their subsequent comparison. What you are actually doing in the first two methods is considering the value of every pixel as a feature. The similarity measure is therefore a distance-computation on a space of very high dimension. This methods are, however, subject to noise and this requires very big datasets in order not to obtain acceptable results.
For this reason, usually one attempts to reduce the space dimensionality. I'm not familiar with the third method, but it seems to go in this direction.
A way to reduce the space dimensionality consists in defining some custom features meaningful for the problem you are facing.
A possibility for the character classification problem could be to define features that measure the response of the input image on strategic subshapes of the characters (an upper horizontal line, a lower one, a circle in the upper part of the image, a diagonal line, etc.).
You could define a minimal set of shapes that, combined together, can generate every character. Then you should retrieve one feature for each shape, by measuring the response (i.e., integrating the signal of the input image inside the shape) of the original image on that particular shape. Finally, you should determine the class which the image belongs to by taking the nearest reference point in this, smaller, space of the features.
Related
For a university project I have to segment characters from a license plate using Python. This sounds reasonably simple. However, the thing is that we are not allowed to use any sophisticated library functions such as cv2.findContours(). The basics such as cv2.imread() cv2.resize() cv2.rectangle() are allowed.
I have written a function that localizes a license plate in an image and outputs a result as can be seen in the images Output 1 and Output 2 . These are binary images.
As one can see. Sometimes, the output of this function is relatively clean (Output 2). However, often it is also noisy (Output 1)
For a clean image (Output 2) I have tried finding the columns that contain less than x black pixels in order to segment the characters. However, this only works when the image is clean. This is often not the case. Changing the x parameter here does not make significant improvement.
Does anybody have suggestions on how I can approach this problem?
For an elementary solution, you can form a profile by counting the black pixels on all vertical lines. Then look for maximas and minimas of the average count in a sliding interval on this profile. The interval length should be a fraction of the expected width of a character. Only the extrema with sufficient contrast should be considered.
To avoid the effect of surrounding features in rotated plates, you can restrict the counting to just a slice of the image.
Once you have approximate vertical limits between the characters, you can repeat a similar processing to get the bottom and top limits of the characters (the sliding interval is no more necessary).
Finally, you can refine the boxing by finding the horizontal limits in the rectangles so formed.
I am doing a research on travel reviews and used word2vec to analyze the reviews. However, when I showed my output to my adviser, he said that I have a lot of words with negative vector values and that only words with positive values are considered logical.
What could these negative values mean? Is there a way to ensure that all vector values I will get in my analysis would be positive?
While some other word-modeling algorithms do in fact model words into spaces where dimensions are 0 or positive, and the individual positive dimensions might be clearly meaningful to humans, that is not the case with the original, canonical 'word2vec' algorithm.
The positive/negativeness of any word2vec word-vector – in a particular dimension, or in net magnitude – has no strong meaning. Meaningful words will be spread out in every direction from the origin point. Directions or neighborhoods in this space that loosely correlate to recognizable categories may appear anywhere, and skew with respect to any of the dimensional axes.
(Here's a related algorithm that does use non-negative constraints – https://www.cs.cmu.edu/~bmurphy/NNSE/. But most references to 'word2vec' mean the classic approach where dimensions usefully range over all reals.)
I am running into problems when computing the relative risk estimation (relrisk.ppp) of two point patterns: One with four marks in a rectangular region and the other with two marks in a circular region.
For the first pattern with four marks, I am able to get the relative risk and the resulting object in a large imlist with 4 elements corresponding to each mark.
However, for the second pattern, it gives a list of 10 elements, of which the first matrix v is empty with NA entries. I am breaking my head on what possibly could be wrong when the created point pattern objects seems to be identical. Any help will be appreciated. Thanks.
For your first dataset, the result is a list of image objects (a list of four objects of class im). For your second dataset, the result of relrisk.ppp is a single image (object of class im). This is the default behaviour when there are only two possible types of points (two possible mark values). See help(relrisk.ppp).
In all cases, you should just be able to plot and print the resulting object. You don't need to examine the internal data of the image.
More explanation: when there are only two possible types of points, the default behaviour of relrisk.ppp is to treat them as case-control data, where the points belonging to the first type are treated as controls (e.g. non-infected people), and the points of the second type are treated as cases (e.g. infected people). The ratio of intensities (cases divided by controls) is estimated as an image.
If you don't want this to happen, set the argument casecontrol=FALSE and then relrisk.ppp will always return a list of images, with one image for each possible mark. Each image gives the spatially-varying probability of that type of point.
It's all explained in help(relrisk.ppp) or in the book.
Since genomic sequences vary greatly in length, I have been trying to work on using denoising autoencoders to get a compact representation for any given sequence. My expected input is a sequence of nucleotides (letters - A, G, T, C), for example, "AAAAGGAATTTCTCTGGGG....".
For images, adding a noise is easy since it's a continuous space. But in a discrete scenario such as this, what would be a good strategy to add noise to my input?
My first thought is to randomly replace some of the nucleotides with "N", which means that the nucleotide at that position couldn't be identified accurately during sequencing. But changing even one nucleotide leads to a completely different sequence altogether, unlike images where adding a small noise doesn't change how the image looks visually. Please let me know if this is right or there's a better way that I am not aware of.
I'm not sure if this will help you or further complicate your issue, but in biology people normally use FASTQ files to store biological sequences and their corresponding Phred quality scores. A Phred quality score is a measure of the quality of the identification of the nucleobases generated by automated DNA sequencing.
For example, if Phred assigns a quality score of 30 to a base, the chances that this base is called incorrectly are 1 in 1000.
Public domain image from Wikipedia
So you can add noise to the Phred quality scores (i.e. the probabilities that the base calling is correct) without changing the sequence.
Also see this paragraph about current work done on compressing FASTQ files.
In a note I found this phrase:
Using isolated symbol probabilities of English language, you can find out the entropy of the language.
What is actually meant by "isolated symbol probabilities"? This is related to the entropy of an information source.
It would be helpful to know where the note came from and what the context is, but even without that I am quite sure this simply means that they use the frequency of individual symbols (e.g. characters) as the basis for entropy, rather than for example the joint probability (of character sequences), or the conditional probability (of one particular character to follow another).
So if you have an alphabet X={a,b,c,...,z} and a probability P(a), P(b),... for each character to appear in text (e.g. based on the frequency found in a data example), you'd compute the entropy by computing -P(x) * log(P(x)) for each character x individually and then taking the sum of all. Then, obviously, you'd have used the probability of each character in isolation, rather than the probability of each character in context.
Note, however, that the term symbol in the note you found does not necessarily refer to characters. It might refer to words or other units of text. Nevertheless, the point they are making is that they apply the classical formula for entropy to probabilities of individual events (characters, words, whatever), not probabilities of complex or conditional events.