How to calculate and sort RGB data on OpenCV? - python-3.x

RGB data. How to calculate and sort them on Python, OpenCV
I want to work on Python, OpenCV these below steps
1. Get the RGB data from pictures
2. Calculate the R*G*B on each pixel of the pictures
3. Sort the data by descending order and plot them on graph or csv
4. Get the max and min and medium of R*G*B
I could handle that the step1. as below code.
However, I don’t know how to write a program after step2
It's better to save the data as csv or numpy
Does anybody have an idea? Please help me. it would be very helpful if you show me the code.
import cv2
import numpy
im_f = np.array(Image.open('data/image.jpg'), 'f')
print(im[:, :])

It is better to keep data in-memory as numpy array. Also, read image using cv2.imread rather than Image.open if it has to be converted to np.array eventually.
For plotting, matplotlib can be used.
Here is how the above mentioned process can be achieved using OpenCV, numpy and matplotlib.
import numpy as np
import cv2, sys
import matplotlib.pyplot as plt
#Read image
im_f = cv2.imread('data/image.jpg')
#Validate image
if im_f is None:
print('Image Not Found')
sys.exit();
#Cast to float type to hold the results
im_f = im_f.astype(np.float32)
#Compute the product of channels and flatten the result to get 1D array
product = (im_f[:,:,0] * im_f[:,:,1] * im_f[:,:,2]).flatten()
#Sort the flattened array and flip it to get elements in descending order
product = np.sort(product)[::-1]
#Compute the min, max and median of product
pmin, pmax , pmed = np.amin(product), np.amax(product), np.median(product)
print('Min = ' + str(pmin))
print('Max = ' + str(pmax))
print('Med = ' + str(pmed))
#Show the sorted array
plt.plot(product)
plt.show()
Tested with Python 3.5.2, OpenCV 4.0.1, numpy 1.15.4, and matplotlib 3.0.2 on Ubuntu 16.04.

Related

Pandas - comparing average of hour periods against each other for a given date range

I'm trying to get used to using datetime data in Pandas and plotting different comparisons for a given dataset. I'm using the London Air Quality dataset for Ozone to practice and am trying to replicate the chart below (that I've created using a pivot table in Excel) with Pandas and matplotlib.
The chart plots an average of each hours Ozone reading for each location across the entire dataset to see if there is one location which is constantly higher than others or if different locations have the highest Ozone levels at different periods throughout the day.
Essentially, I'm looking to plot the hourly average of Ozone for each location.
I've attempted to reshape the data into a multiindex format and then plot, similar to what I'd do in excel before plotting but am unsure if this is the correct way to approach the problem. Code for reshaping is below. I am still getting used to reshaping so not sure if this is the correct use/I am approaching the problem in the correct way and open to other methods to accomplish this task. Any assistance to accomplish this task would be much appreciated!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('/Users/xx/Downloads/LaqnData.csv')
data['ReadingDateTime'] = pd.to_datetime(data['ReadingDateTime'])
data['Date'] = pd.to_datetime(data['ReadingDateTime']).dt.date
data['Time'] = pd.to_datetime(data['ReadingDateTime']).dt.time
data.set_index(['Date', 'Time'], inplace = True)
hourly_dataframe = data.pivot_table(columns = 'Site', values = 'Value', index = ['Date', 'Time'])
hourly_dataframe.fillna(method = 'ffill', inplace = True)
hourly_dataframe[hourly_dataframe < 0] = 0
I have gone to the site and downloaded a 24 hour reading for the following;
data.Site.unique()
array(['BX1', 'TH4', 'BT4', 'HI0', 'BL0', 'RD0'], dtype=object)
I adopted your code to this point:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('/Users/xx/Downloads/LaqnData.csv')
data['ReadingDateTime'] = pd.to_datetime(data['ReadingDateTime'])
I then use datetime index to call each hour in the groupby function.
data.groupby([data.index.hour, data['Site']])['Value'].mean().reset_index()`#Convert to dataframe.`
To plot, I chain unstack to the groupby function and plot directly.
data.groupby([data.index.hour, data['Site']])['Value'].mean().reset_index#unstack().plot()
plt.xlabel('Hour of the day')
plt.ylabel('Ozone')
plt.title('Avarage Hourly comparison')
plt.legend()`# If you want the legend to appear in default location`
If fussed about legend location, this post explains it very well. In your case;
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15),
fancybox=True, shadow=True, ncol=6)

Import PDF Image From MatPlotLib to ReportLab

I am trying to insert a saved PDF image into a ReportLab flowable.
I have seen several answers to similar questions and many involve using Py2PDF like this:
import PyPDF2
import PIL
input1 = PyPDF2.PdfFileReader(open(path+"image.pdf", "rb"))
page0 = input1.getPage(0)
xObject = page0['/Resources']['/XObject'].getObject()
for obj in xObject:
#Do something here
The trouble I'm having is with a sample image I've saved from MatPlotLib as a PDF. When I try to access that saved image with the code above, it returns nothing under page0['/Resources']['/XObject'].
In fact, here's what I see when I look at page0 and /XObject:
'/XObject': {}
Here's the code I used to generate the PDF:
import matplotlib.pyplot as plt
import numpy as np
# Fixing random state for reproducibility
np.random.seed(19680801)
plt.rcdefaults()
fig, ax = plt.subplots()
# Example data
people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim')
y_pos = np.arange(len(people))
performance = 3 + 10 * np.random.rand(len(people))
error = np.random.rand(len(people))
ax.barh(y_pos, performance, xerr=error, align='center',
color='green', ecolor='black')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.invert_yaxis() # labels read top-to-bottom
ax.set_xlabel('Performance')
ax.set_title('How fast do you want to go today?')
plt.savefig(path+'image.pdf',bbox_inches='tight')
Thanks in advance!

Python OpenCV, return an binary array when edge is detected

By using 'Canny' function in opencv the output argument is numpy array like [0,0,0,0,255] etc. Can i output a binary array like true/false or 1/0 like if detected return 1. Actually matlab do that as default. Please take a look on output section.
Find edges in intensity image, Matlab
In python code like this:
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('messi5.jpg',0)
edges = cv2.Canny(img,100,200) #numpy array. must be binary array (1/0)
You can convert the output array immediately:
edges_bool = cv2.Canny(img,100,200).astype(bool)
Alternatively, you can use later the following function:
edges_bool = np.asarray(edges, dtype=bool)

Timeserie datetick problems when using pandas.DataFrame.plot method

I just discovered something really strange when using plot method of pandas.DataFrame. I am using pandas 0.19.1. Here is my MWE:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
fig, axe = plt.subplots()
x.plot(ax=axe)
plt.show(axe)
xt = axe.get_xticks()
When I try to format my xticklabels I get strange beahviours, then I insepcted objects to understand and I have found the following:
t[-1] - t[0] = Timedelta('7 days 00:00:00'), confirming the DateTimeIndex is what I expect;
xt = [175320, 175488], xticks are integers but they are not equals to a number of days since epoch (I do not have any idea about what it is);
xt[-1] - xt[0] = 168 there are more like index, there is the same amount that len(x) = 169.
This explains why I cannot succed to format my axe using:
axe.xaxis.set_major_locator(mdates.HourLocator(byhour=(0,6,12,18)))
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
The first raise an error that there is to many ticks to generate
The second show that my first tick is Fri 00:00 but it should be Mon 00:00 (in fact matplotlib assumes the first tick to be 0481-01-03 00:00, oops this is where my bug is).
It looks like there is some incompatibility between pandas and matplotlib integer to date conversion but I cannot find out how to fix this issue.
If I run instead:
fig, axe = plt.subplots()
axe.plot(x)
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
plt.show(axe)
xt = axe.get_xticks()
Everything works as expected but I miss all cool features from pandas.DataFrame.plot method such as curve labeling, etc. And here xt = [726468. 726475.].
How can I properly format my ticks using pandas.DataFrame.plot method instead of axe.plot and avoiding this issue?
Update
The problem seems to be about origin and scale (units) of underlying numbers for date representation. Anyway I cannot control it, even by forcing it to the correct type:
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H', origin='unix', units='D')
There is a discrepancy between matplotlib and pandas representation. And I could not find any documentation of this problem.
Is this what you are going for? Note I shortened the date_range to make it easier to see the labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
t = pd.date_range('1990-01-01', '1990-01-04', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
# resample the df to get the index at 6-hour intervals
l = x.resample('6H').first().index
# set the ticks when you plot. this appears to position them, but not set the label
ax = x.plot(xticks=l)
# set the display value of the tick labels
ax.set_xticklabels(l.strftime("%a %H:%M"))
# hide the labels from the initial pandas plot
ax.set_xticklabels([], minor=True)
# make pretty
ax.get_figure().autofmt_xdate()
plt.show()

Why is the plot in librosa different?

I am currently trying using librosa to perform stfft, such that the parameter resembles a stfft process from a different framework (Kaldi).
The audio file is fash-b-an251
Kaldi does it using a sample frequency of 16 KHz, window_size = 400 (25ms), hop_length=160 (10ms).
The spectrogram extracted from this looks like this:
I then tried to do the same using librosa:
import numpy as np
import sys
import librosa
import os
import scipy
import matplotlib.pyplot as plt
from matplotlib import cm
# Input parameter
# relative_path_to_file
if len(sys.argv) < 1:
print "Missing Arguments!"
print "python spectogram_librosa.py path_to_audio_file"
sys.exit()
path = sys.argv[1]
abs_path = os.path.abspath(path)
spectogram_dnn = "/home/user/dnn/spectogram"
if not os.path.exists(spectogram_dnn):
print "spectogram_dnn folder didn't exist!"
os.makedirs(spectogram_dnn)
print "Created!"
y,sr = librosa.load(abs_path,sr=16000)
D = librosa.logamplitude(np.abs(librosa.core.stft(y, win_length=400, hop_length=160, window=scipy.signal.hanning,center=False)), ref_power=np.max)
librosa.display.specshow(D,sr=16000,hop_length=160, x_axis='time', y_axis='log', cmap=cm.jet)
plt.colorbar(format='%+2.0f dB')
plt.title('Log power spectrogram')
plt.show()
raw_input()
sys.exit()
Which is basically taken from here:
In which i've modified the stfft function such that it fits my parameters..
Problems is that is creates an entirely different plot..
So.. What am I doing wrong in librosa?.. Why is this plot so much different, from the one created in kaldi.
Am I missing something?
It has to do with the Hz scale. The one in the first image is linear while the one in the second image is logarithmic. You can fix it by either changing the scale in either of the images to match the other.

Resources