I tried to import pictures from the camera and I could get 3-dim data from the image.
img = WebcamModule.getImg(True, size=[240,120])
img = image.img_to_array(img)
then, from the code below I tried to expand one more dimention which is batch size. But it doesn't work.
How do I import batch size into the data.
There is a picture of the result below.
img = np.expand_dims(img,axis=0)
Finally, the code below for predicting doesn't work.
val = float(model.predict(img))
I usually do this:
input = input_raw[np.newaxis, ...]
prediction = model.predict(input)
pred = np.squeeze(prediction)
draw(pred)
Check the shape of the image and the image the network wants, check if the model is channel first or channel last, but this always worked for me.
Related
I just start the module worcloud in Python 3.7, and I'm using the next cxode to generate wordclouds from a dictionary and I'm trying to use differents masks, but this works for some images: in two cases works with images of 831x816 and 1000x808. This has to be with the size of the image? Or is because the images is kind a blurry? Or what is it?
I paste my code:
from PIL import Image
our_mask = np.array(Image.open('twitter.png'))
twitter_cloud = WordCloud(background_color = 'white', mask = our_mask)
twitter_cloud.generate_from_frequencies(frequencies)
twitter_cloud.to_file("twitter_cloud.jpg")
plt.imshow(twitter_cloud)
plt.axis('off')
plt.show()
How can i fix this?
I had a similar problem with a black-and-white image I used. What fixed it for me was when I cropped the image more closely to the black drawing so there was no unnecessary bulk white area on the edges.
Some images should be adjusted for the process. Note only white point values for image is mask_out (other values are mask_in). The problem is that some of images are not suitable for masking. The reason is that the color's np.array somewhat mismatches. To solve this, following can be done:
1.Creating mask object: (Please try with your own image as I couldn't upload:)
import numpy as np;
import pandas as pd;
from PIL import Image;
from wordcloud import WordCloud
mask = np.array(Image.open("filepath/picture.png"))
print(mask)
If the output values for white np.array is 255, then it is okay. But if it is 0 or probably other value, we have to change this to 255.
2.In the case of other values, the code for changing the values:
2-1. Create function for transforming (here our value = 0)
def transform_zeros(val):
if val == 0:
return 255
else:
return val
2-2. Creating the same shaped np.array:
maskable_image = np.ndarray((mask.shape[0],mask.shape[1]), np.int32)
2-3. Transformation:
for i in range(len(mask)):
maskable_image[i] = list(map(transform_zeros, mask[i]))
3.Checking:
print(maskable_image)
Then you can use this array for your mask.
mask = maskable_image
All this is copied and interpreted from this link, so check it if you find my attempted explanation unclear, as I just provided solution but don't understand that much about color arrays of image and its transformation.
Input-Sample
I am trying to pre-process my images in order to improve the ocr quality. However, I am stuck with a problem.
The Images I am dealing with contain different text orientations within the same image (2 pages, 1st is vertical, the 2nd one is horizontally oriented and they are scanned to the same image.
The text direction is automatically detected for the first part. nevertheless, the rest of the text from the other page is completely missed up.
I was thinking of creating a zonal template to detect the regions of interest but I don't know how.
Or automatically detect the border and split the image adaptively then flip the splitted part to achieve the required result.
I could set splitting based on a fixed pixel height but it is not constant as well.
from tesserocr import PyTessBaseAPI, RIL
import cv2
from PIL import Image
with PyTessBaseAPI() as api:
filePath = r'sample.jpg'
img = Image.open(filePath)
api.SetImage(img)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print('Found {} textline image components.'.format(len(boxes)))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
for box in boxes:
box = boxes[0][1]
x = box.get('x')
y = box.get('y')
h = box.get('h')
w = box.get('w')
cimg = cv2.imread(filePath)
crop_img = cimg[y:y+h, x:x+w]
cv2.imshow("cropped", crop_img)
cv2.waitKey(0)
output image
as you can see i can apply an orientation detection but I wount get any meaningful text out of such an image.
Try Tesseract API method GetComponentImages and then DetectOrientationScript on each component image.
I'm Marius, a maths student in the first year.
We have recieved a team-assignment where we have to implement a fourier transformation and we chose to try to encode the transformation of an image to a JPEG image.
to simplify the problem for ourselves, we chose to do it only for pictures that are greyscaled.
This is my code so far:
from PIL import Image
import numpy as np
import sympy as sp
#
#ALLEMAAL INFORMATIE GEEN BEREKENINGEN
img = Image.open('mario.png')
img = img.convert('L') # convert to monochrome picture
img.show() #opens the picture
pixels = list(img.getdata())
print(pixels) #to see if we got the pixel numeric values correct
grootte = list(img.size)
print(len(pixels)) #to check if the amount of pixels is correct.
kolommen, rijen = img.size
print("het aantal kolommen is",kolommen,"het aantal rijen is",rijen)
#tot hier allemaal informatie
pixelMatrix = []
while pixels != []:
pixelMatrix.append(pixels[:kolommen])
pixels = pixels[kolommen:]
print(pixelMatrix)
pixelMatrix = np.array(pixelMatrix)
print(pixelMatrix.shape)
Now the problem forms itself in the last 3 lines. I want to try to convert the matrix of values back into an Image with the matrix 'pixelMatrix' as it's value.
I've tried many things, but this seems to be the most obvious way:
im2 = Image.new('L',(kolommen,rijen))
im2.putdata(pixels)
im2.show()
When I use this, it just gives me a black image of the correct dimensions.
Any ideas on how to get back the original picture, starting from the values in my matrix pixelMatrix?
Post Scriptum: We still have to implement the transformation itself, but that would be useless unless we are sure we can convert a matrix back into a greyscaled image.
I have trained the following Sagemaker model: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco
I've tried both the JSON and RecordIO version. In both, the algorithm is tested on ONE sample image. However, I have a dataset of 2000 pictures, which I would like to test. I have saved the 2000 jpg pictures in a folder within an S3 bucket and I also have two .mat files (pics + ground truth). How can I apply this model to all 2000 pictures at once and then save the results, rather than doing it one picture at a time?
I am using the code below to load a single picture from my S3 bucket:
object = bucket.Object('pictures/pic1.jpg')
object.download_file('pic1.jpg')
img=mpimg.imread('pic1.jpg')
img_name = 'pic1.jpg'
imgplot = plt.imshow(img)
plt.show(imgplot)
with open(img_name, 'rb') as image:
f = image.read()
b = bytearray(f)
ne = open('n.txt','wb')
ne.write(b)
import json
object_detector.content_type = 'image/jpeg'
results = object_detector.predict(b)
detections = json.loads(results)
print (detections['prediction'])
I'm not sure if I understood your question correctly. However, if you want to feed multiple images to the model at once, you can create a multi-dimensional array of images (byte arrays) to feed the model.
The code would look something like this.
import numpy as np
...
# predict_images_list is a Python list of byte arrays
predict_images = np.stack(predict_images_list)
with graph.as_default():
# results is an list of typical results you'd get.
results = object_detector.predict(predict_images)
But, I'm not sure if it's a good idea to feed 2000 images at once. Better to batch them in 20-30 images at a time and predict.
png = tf.read_file(filename)
image = tf.image.decode_png(png, channels=3)
image = tf.cast(image, tf.float32)
The images are read and casted to float32. How do I perform a normalization on this? I have performed normalization on grayscale. But need some help with RGB image.
I thought of doing this
def normalized(down):
norm=np.zeros((600,800,3),np.float32)
norm_rgb=np.zeros((600,800,3),np.uint8)
b=rgb[:,:,0]
g=rgb[:,:,1]
r=rgb[:,:,2]
sum=b+g+r
norm[:,:,0]=b/sum*255.0
norm[:,:,1]=g/sum*255.0
norm[:,:,2]=r/sum*255.0
But for this above function to work I need to start a sess on the image and then perform numpy operations.
Could someone help me do this in tensorflow itself?
you can use tf.image.per_image_standardization. It linearly scales image to have zero mean and unit norm.
image = tf.image.per_image_standardization(image)