Creating image histograms from all pictures in a folder

Creating image histograms from all pictures in a folder - python-3.x

I'm trying to create a histogram for each image in a folder, and save the plots of them into a CSV file. The user would enter the folder the images are saved, and then the files would get created and named accordingly
files = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"] #loop to get all files from folder
for x in files:
image = "x + 1"
img2 = cv2.imread('similarImages/' + directory + '/' + image + '.png', cv2.IMREAD_COLOR)
histSim = cv2.calcHist([img2], [1], None, [256], [0, 256]) # create histo of each image
np.savetxt('Test/similarImage' + x + '.csv', histSim, delimiter=',') # save save plots to csv
From my previous knowledge of python, I've theory crafted the above code, but in classic fashion, it doesnt work (shocker)
Am I going along the right lines? If not could I get a nudge in the right direction, and if so, why doesnt it work?
It's been a while since I took on something like this, and such I am a little rusty, many thanks, Ben

You can use pandas for this task. If you want to store all the histograms in a single csv file you can use a list and append all the histogram values to it using this
df = []
df.append(cv2.calcHist(img, [1], None, [256], [0, 256])[:, 0]) # check if you want like this or transpose.
Then convert it to a dataframe using pd.DataFrame and store it as a csv file using df.to_csv
If you want to save each histogram to its independent csv file then you can:
histSim = pd.DataFrame(cv2.calcHist(img, [1], None, [256], [0, 256]))
histSim.to_csv('histogram.csv', index=False)

The issue I was having was the image variable was a string, and thus, when I was adding 1 to it, it was concatenating, and not adding the values, so using integers and then converting to a string when I needed it in the file path worked
image = 0
files = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # loop to get all files from folder
for x in files:
image = x + 1
image = str(image)
img2 = cv2.imread('similarImages/' + directory + '/' + image + '.png', cv2.IMREAD_COLOR)
histSim = pd.DataFrame(cv2.calcHist([img2], [1], None, [256], [0, 256])) # create histo of each image
histSim.to_csv('Test/histogram' + image + '.csv', index=False)

Related

How to identify data gaps based on filenames on Python?

It happens that I have a folder located at
C:\Users\StoreX\Downloads\Binance futures data\AliceUSDT-Mark_Prices_Klines_1h_Timeframe
which only contains 253 csv files with the following filenames:
1. ALICEUSDT-1h-2021-06-01.csv
2. ALICEUSDT-1h-2021-06-02.csv
3. ALICEUSDT-1h-2021-06-03.csv
4. ALICEUSDT-1h-2021-06-06.csv
5. ALICEUSDT-1h-2021-06-09.csv
6. ALICEUSDT-1h-2021-06-11.csv
7. ALICEUSDT-1h-2021-06-12.csv
.
.
.
253. ALICEUSDT-1h-2022-02-13.csv
Each of those files contains the hourly price action of a particular asset, having in total 24 rows (no column names), and therefore, it can be assumed that each filename corresponds to the price action data taken for a particular asset in a particular date.
However, if you look closely at the example above, there are some files missing at the very beginning, which are:
ALICEUSDT-1h-2021-06-04.csv
ALICEUSDT-1h-2021-06-05.csv
ALICEUSDT-1h-2021-06-07.csv
ALICEUSDT-1h-2021-06-08.csv
ALICEUSDT-1h-2021-06-10.csv
This obviously means I could not take into account those files that are previous to the missing files for developing a trading strategy.
So, I would first have to detect which files are missing based on its name, for then defining where to start plotting the price action to avoiding all of the of the possible gaps.
Update: Here's what I have done so far:
import os
import datetime
def check_path(infile):
return os.path.exists(infile)
first_entry = input('Tell me the path where your csv files are located at:')
while True:
if check_path(first_entry) == False:
print('\n')
print('This PATH is invalid!')
first_entry = input('Tell me the RIGHT PATH in which your csv files are located: ')
elif check_path(first_entry) == True:
print('\n')
final_output = first_entry
break
for name in os.listdir(first_entry):
if name.endswith(".csv"):
print((name.partition('-')[-1]).partition('-')[-1].removesuffix(".csv"))
Output:
2021-06-01
2021-06-02
2021-06-03
2021-06-06
2021-06-09
.
.
.
2022-02-13
Any ideas?

IIUC, you have a list of dates and try to find out what dates are missing if you compare the list against a date range based on min and max date in the list. Sets can help, ex:
import re
from datetime import datetime, timedelta
l = ["ALICEUSDT-1h-2021-06-01.csv",
"ALICEUSDT-1h-2021-06-02.csv",
"ALICEUSDT-1h-2021-06-03.csv",
"ALICEUSDT-1h-2021-06-06.csv",
"ALICEUSDT-1h-2021-06-09.csv",
"ALICEUSDT-1h-2021-06-11.csv",
"ALICEUSDT-1h-2021-06-12.csv"]
# extract the dates, you don't have to use a regex here, it's more for convenience
d = [re.search(r"[0-9]{4}\-[0-9]{2}\-[0-9]{2}", s).group() for s in l]
# to datetime
d = [datetime.fromisoformat(s) for s in d]
# now make a date range based on min and max dates in d
r = [min(d)+timedelta(n) for n in range((max(d)-min(d)).days+1)]
# ...so we can do a membership test with sets to find out what is missing...
missing = set(r) - set(d)
sorted(missing)
[datetime.datetime(2021, 6, 4, 0, 0),
datetime.datetime(2021, 6, 5, 0, 0),
datetime.datetime(2021, 6, 7, 0, 0),
datetime.datetime(2021, 6, 8, 0, 0),
datetime.datetime(2021, 6, 10, 0, 0)]

Why subscripts are missing in the labels?

I want to display the subscript in the labels in the bar plot. Labels are the keys from the dictionary data in the following. I know how to use latex to do so, but I need to display it as it is from the keys in the dictionary. When I use the following script, it just displays the empty box, instead of the subscript.
import numpy as np
data = {'CO₆': 15,
'DO₄': 144,
'EO₈': 3,
'FaO₉': 1,
'GO₅': 7,
'Ha₆': 5}
f, ax = plt.subplots(figsize = (40, 4))
bin = np.arange(len(data.keys()))
ax.bar(data.keys(), data.values(), color='brown', align = "center", width = 0.3);
plt.xticks(rotation='vertical');
ax.xaxis.set_tick_params(labelsize = 32);
ax.yaxis.set_tick_params(labelsize = 32);
plt.xlim(-0.5, bin.size-0.5);

The font that you are using must not have those unicode characters.
Try changing the font, this one works for me:
plt.rcParams['font.sans-serif'] = ['DejaVu Sans']
To use a Serif font:
plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.serif'] = ['DejaVu Serif']

Tkinter Insert outputting as [list] after "".join()

I am making a python program that converts numbers to binary with a tkinter GUI. When using e.insert on a entry it will return a normal string:
0101010011
as
[0, 1, 0, 1...]
The function which converts a number to binary. I am aware the bin() alternative exists, I just wanted to create my own.
def dec2bin(decnum):
binarylist = []
while (decnum > 0):
binarylist.append(decnum % 2)
decnum = decnum // 2
binarylist.reverse()
binarylist = str(binarylist)
return "".join(binarylist)
The function that is called when a button in the tkinter gui is pressed which is intended to replace one of the entry box's text with the binary output.
def convert():
decimal = entrydec.get()
decimal = int(decimal)
entrybin.delete(0, END)
entrybin.insert(0, dec2bin(decimal))
I expect the output of 010101, but the actual output is [0, 1, 0, 1, 0, 1]

You can't use str() on list - str([0, 1, 0, 0]) to get list with strings - ["0", "1", "0", "0"]
You can use list comprehension:
binarylist = [str(x) for x in binarylist]
or map() :
binarylist = map(str, binarylist)
Or you have to convert numbers 0 ,1 to string when you add to list:
binarylist.append( str(decnum % 2) )
And later you can use join()
def dec2bin(decnum):
binarylist = []
while (decnum > 0):
binarylist.append( str(decnum % 2) ) # <-- use str()
decnum = decnum // 2
binarylist.reverse()
#binarylist = str(binarylist) # <-- remove it
return "".join(binarylist)
dec2bin(12)
Result:
"1100"

Creating Leaflet Map with Python and Folium

Here is the code:
for lat,lon,name,elev in zip(df['LAT'],df['LON'],df['NAME'],df['ELEV']):
fg.add_child(folium.Marker(location=
[lat,lon],popup=name,icon=folium.Icon(color=color_ori(elev))))
I am creating a map for volcanoes in the USA, and I want to show a marker with their names in the popup. I can't do that with the code above, but when I use popup=str(elev)+"m", it works fine. How do I include the names from my CSV file into a popup ?

You can just apply a method add_child to each marker with a popup object as an argument.
The code is:
import folium
lats = range(59, 63)
lons = range(10, 14)
names = ['marker' + str(i) for i in range(4)]
elevations = range(4)
m = folium.Map([60, 10], tiles='Mapbox Bright', zoom_start=5)
for lat, lon, name, elev in zip(lats, lons, names, elevations):
folium.Marker([lat, lon], icon=folium.Icon(color='red')).add_child(folium.Popup(name)).add_to(m)
Output:

Separate Spam and Ham for WordCloud Visualization

I am performing spam detection and want to visualize spam and ham keywords separately in Wordcloud. Here's my .csv file.
data = pd.read_csv("spam.csv",encoding='latin-1')
data = data.rename(columns = {"v1":"label", "v2":"message"})
data = data.replace({"spam":"1","ham":"0"})
Here's my code for WordCloud. I need help with spam_words. I cannot generate the right graph.
import matplotlib.pyplot as plt
from wordcloud import WordCloud
spam_words = ' '.join(list(data[data['label'] == 1 ]['message']))
spam_wc = WordCloud(width = 512, height = 512).generate(spam_words)
plt.figure(figsize = (10,8), facecolor = 'k')
plt.imshow(spam_wc)
plt.axis('off')
plt.tight_layout(pad = 0)
plt.show()

The issue is that the current code replaces "spam" and "ham" with the one-character strings "1" and "0", but you filter the DataFrame based on comparison with the integer 1. Change the replace line to this:
data = data.replace({"spam": 1, "ham": 0})

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Creating image histograms from all pictures in a folder - python-3.x

Related

How to identify data gaps based on filenames on Python?

Why subscripts are missing in the labels?

Tkinter Insert outputting as [list] after "".join()

Creating Leaflet Map with Python and Folium

Separate Spam and Ham for WordCloud Visualization

Categories

Resources