As i'm doing card scanning to extract the text from images i want to get exact text from image but i'm getting only accurate image values from card and im trying to applpy adaptive threshold on my card but i'm geeting error like
File "C:/Users/shruthipriyanka/PycharmProjects/Scanning/sample.py", line 11, in <module>
newimage = cv2.resize(oriimage,(583,327))
Type Error: src is not a numpy array, neither a scalar
can anyone explain what i did wrong and suggest some resolution for this.
What most likely is happening is you are passing the resize function a PIL image. You can do something like thispix = numpy.array(pic) to convert the image to a numpy array.
Related
I have a list called training_data that I'd like to store in an .npy file.
Each element of the list contains a 480x270 image matrix screen and a 1x4 output list; So an element would look like so:
[screen,output]
Essentially, I'm storing an image and the action taken(the key pressed-out of 4 available options) at the instant that the image was captured from the screen to train a CNN.
While in the list format, training_data stores all my records without any issues, so this works:
training_data.append([screen,output])
But, when I try to save the list as a numpy array, into a .npy file, like so:
np.save(file_name,training_data)
I get the following error:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (1000, 2) + inhomogeneous part.
I'm following a tutorial to create this CNN project. Admittedly, the tutorial was made a few years back (2017). Back then, the save operation worked flawlessly:
Tutorial Timestamp: 17:49
Any ideas as to why this error occurs will be greatly appreciated.
Thank you.
I was trying to use pytesseract to find the box positions of each letter in an image. I tried to use an image, and cropping it with Pillow and it worked, but when I tried with a lower character size image (example), the program may recognize the characters, but cropping the image with the box coordinates give me images like this. I also tried to double up the size of the original image, but it changed nothing.
img = Image.open('imgtest.png')
data=pytesseract.image_to_boxes(img)
dati= data.splitlines()
corde=[]
for i in dati[0].split()[1:5]: #just trying with the first character
corde.append(int(i))
im=img.crop(tuple(corde))
im.save('cimg.png')
If we stick to the source code of image_to_boxes, we see, that the returned coordinates are in the following order:
left bottom right top
From the documentation on Image.crop, we see, that the expected order of coordinates is:
left upper right lower
Now, it also seems, that pytesseract iterates images from bottom to top. Therefore, we also need to further convert the top/upper and bottom/lower coordinates.
That'd be the reworked code:
from PIL import Image
import pytesseract
img = Image.open('MJwQi9f.png')
data = pytesseract.image_to_boxes(img)
dati = data.splitlines()
corde = []
for i in dati[0].split()[1:5]:
corde.append(int(i))
corde = tuple([corde[0], img.size[1]-corde[3], corde[2], img.size[1]-corde[1]])
im = img.crop(tuple(corde))
im.save('cimg.png')
You see, left and right are in the same place, but top/upper and bottom/lower switched places, and where also altered w.r.t. the image height.
And, that's the updated output:
The result isn't optimal, but I assume, that's due to the font.
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
Pillow: 8.1.0
pytesseract: 4.00.00alpha
----------------------------------------
The purpose is to take data from a virtual camera (from a camera in Gazebo simulation, updating every second) and use Detectron2 (requires data come from cv2.VideoCapture) to recognize other objects in the simulation. The virtual camera of course does not appear in lspci so I can't simply use cv2.VideoCapture(0).
So my code is
bridge = CvBridge()
cv_image = bridge.imgmsg_to_cv2(data, desired_encoding='bgr8') #cv_image is numpy.ndarray, size (100,100,3)
cap = cv2.VideoCapture()
ret, frame = cap.read(image=cv_image)
print(ret, frame)
but it just prints False None, I assume because there's nothing being captured in cap. I
f I replace line 2 with cap = cv2.VideoCapture(cv_image) I get the error,
TypeError: only size-1 arrays can be converted to Python scalars
since I believe it requires either and integer (representing webcam number) or string (representing video file).
And for reference,
cv_image = bridge.imgmsg_to_cv2(data, desired_encoding='bgr8') # cv_image is numpy.ndarray
cv2.imshow('image', cv_image)
cv2.waitKey(1)
displays the image perfectly fine. Could there be a way to use imshow() or something similar as input for VideoCapture()?
However, cap = cv2.VideoCapture(cv2.imshow('image', cv_image))opens a blank window and gives me,
[ERROR:0] global /io/opencv/modules/videoio/src/cap.cpp (116) open VIDEOIO(CV_IMAGES): raised OpenCV exception:
OpenCV(4.2.0) /io/opencv/modules/videoio/src/cap_images.cpp:293: error: (-215:Assertion failed) !_filename.empty() in function 'open'
How can I create a cv2.VideoCapture() object that can use the image data that I have? Or what's something that might point me in the right direction?
Ubuntu 18.04 and Python 3.6 with opencv-python 4.2.0.34
From what I found on Gazebo tutorials page:
In Rviz, add a ''Camera'' display and under ''Image Topic'' set it to /rrbot/camera1/image_raw.
In your case it probably won't be /rrbot/camera1/ name, but the one you are setting in .gazebo file
<cameraName>rrbot/camera1</cameraName>
<imageTopicName>image_raw</imageTopicName>
<cameraInfoTopicName>camera_info</cameraInfoTopicName>
So you can create subscriber and use cv2.VideoCapture() for every single image from that topic.
My solution was to rewrite Detectron2's --input flag in the demo to constantly run a ROS2 callback with demo.run_on_image(cv_data). So instead of making it process video, it just quickly processes each new image one at a time. This is a workaround so that cv2.VideoCapture() is not needed.
I'm currently trying to perform a Polar to Cartesian Coordinate Image transformation, to display a raw sonar image into a 'fan-display'.
Initially I have a Numpy Array image of type np.float64, that can be seen below:
After doing some searching, I came across this StackOverflow post Inverse transform an image from Polar to Cartesian in OpenCV with a very similar problem, in which the poster seemed to have solved his/her issue by using the Python Wand library (http://docs.wand-py.org/en/0.5.9/index.html), specifically using their set of Distortion functions.
However, when I tried to use Wand and read the image in, I instead ended up with Wand getting the image below, which seems to be smaller than the original one. However, the weird thing is that img.size still gives the same size number as the original image's shape.
The code for this transformation can be seen below:
print(raw_img.shape)
wand_img = Image.from_array(raw_img.astype(np.uint8), channel_map="I") #=> (369, 256)
display(wand_img)
print("Current image size", wand_img.size) #=> "Current image size (369, 256)"
This is definitely quite problematic as Wand will automatically give the wrong 'fan image'. Is anybody familiar with this kind of problem with the Wand library previously, and if yes, may I ask what is the recommended solution to fix this issue?
If this issue isn't resolved soon I have an alternative backup of using OpenCV's cv::remap function (https://docs.opencv.org/4.1.2/da/d54/group__imgproc__transform.html#ga5bb5a1fea74ea38e1a5445ca803ff121). However the problem with this is that I'm not sure what mapping arrays (i.e. map_x and map_y) to use to perform the Polar->Cartesian transformation, as using a mapping matrix that implements the transformation equations below:
r = polar_distances(raw_img)
x = r * cos(theta)
y = r * sin(theta)
didn't seem to work and instead threw out errors from OpenCV as well.
Any kind of help and insight into this issue is greatly appreciated. Thank you!
- NickS
EDIT I've tried on another image example as well, and it still shows a similar problem. So first, I imported the image into Python using OpenCV, using these lines of code:
import matplotlib.pyplot as plt
from wand.image import Image
from wand.display import display
import cv2
img = cv2.imread("Test_Img.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(img_rgb)
plt.show()
which showed the following display as a result:
However, as I continued and tried to open the img_rgb object with Wand, using the code below:
wand_img = Image.from_array(img_rgb)
display(img_rgb)
I'm getting the following result instead.
I tried to open the image using wand.image.Image() on the file directly, which is able to display the image correctly when using display() function, so I believe that there isn't anything wrong with the wand library installation on the system.
Is there a missing step that I required to convert the numpy into Wand Image that I'm missing? If so, what would it be and what is the suggested method to do so?
Please do keep in mind that I'm stressing the conversion of Numpy to Wand Image quite crucial, the raw sonar images are stored as binary data, thus the required use of Numpy to convert them to proper images.
Is there a missing step that I required to convert the numpy into Wand Image that I'm missing?
No, but there is a bug in Wand's Numpy implementation in Wand 0.5.x. The shape of OpenCV's ndarray is (ROWS, COLUMNS, CHANNELS), but Wand's ndarray is (WIDTH, HEIGHT, CHANNELS). I believe this has been fixed for the future 0.6.x releases.
If so, what would it be and what is the suggested method to do so?
Swap the values in img_rgb.shape before passing to Wand.
img_rgb.shape = (img_rgb.shape[1], img_rgb.shape[0], img_rgb.shape[2],)
with Image.from_array(img_rgb) as img:
display(img)
I've a basic problem with Python's library PIL. I have some .txt files containing only 0 and 1 values arranged in matrices. I have transformed the "binary" data in an image with the function Image.fromarray() included in PIL. The format of my data produces black&white images if I multiply it by 255, and that's fine for me. Now I want to add some text to the image, using the appropriate text function included in PIL, but I want that text to be coloured. Clearly, I can't do it because the image obtained from fromarray has a grayscale colormap. How can I change it?
You can get a RGB image from a monochromatic one like this:
from PIL import Image
from numpy import eye
arr = (eye(200)*255).astype('uint8') # sample array
im = Image.fromarray(arr) # monochromatic image
imrgb = im.convert('RGB') # color image
imrgb.show()