Convert string from text file inorder to plot using matplotlib - python-3.x

I am trying to plot a graph using dates and integers from a text file which looks like this:
However I keep getting this error
Traceback (most recent call last):
File "C:\Users\Haeshan\Desktop\Comp Sci CC\graph.py", line 21, in
graph()
File "C:\Users\Haeshan\Desktop\Comp Sci CC\graph.py", line 9, in graph
converters = {1: mdates.strpdate2num("%d/%m/%Y")})
File "C:\Users\Haeshan\AppData\Local\Programs\Python\Python35\lib\site-packages\numpy\lib\npyio.py", line 930, in loadtxt
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "C:\Users\Haeshan\AppData\Local\Programs\Python\Python35\lib\site-packages\numpy\lib\npyio.py", line 930, in
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "C:\Users\Haeshan\AppData\Local\Programs\Python\Python35\lib\site-packages\numpy\lib\npyio.py", line 659, in floatconv
return float(x)
ValueError: could not convert string to float: b"['10'"
import matplotlib.pyplot as plt
import numpy as np
import csv
import matplotlib.dates as mdates
def graph():
date, value = np.loadtxt("Scores.txt", delimiter = ",", unpack=True,
converters = {1: mdates.strpdate2num("%d/%m/%Y")})
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1, axisbg ="white")
plt.plot_date(x=date, y=value)
plt.title("Performace")
plt.ylabel("Score")
plt.xlabel("Date")
graph()
Any ideas,
Many thanks

The problem is your first column which has quotation marks. Since you have anyhow define converters I would change it to
converters = {0: (lambda x: int(x)),1: mdates.strpdate2num("%d/%m/%Y")})
UPDATE
Sorry, due to the quotations marks I didn't see the other issues. TBH, I would not use np.loadtxt in this case since you have also the square brackets in each line. In addition you have the issue that you are using Python 3 where strings are unicode and not bytes anymore, but loadtxt is going for bytes (thus the b in front of your line).
My suggestion is to read it in line by line and parse each line, e.g.
dates,values = list(),list()
formater = mdates.strpdate2num("%d/%m/%Y")
with open("Scores.txt",'r',newline='\n') as input_file:
for line in input_file:
# Remove the square brackets, quotation marks and newlines (if necessary)
# Be aware that this will also kill all square brackets and quotations marks in your line
entries = line.replace('[','').replace(']','').replace("'",'').replace('\n','').split(',')
values.append(int(entries[0]))
dates.append(formater(entries[1]))

Related

Bar plot with different minimal value for each bar

I'm trying to reproduce this type of graph :
basically, the Y axis represent the date of beginning and end of a phenomenon for each year.
but here is what I have when I try to plot my data :
It seems that no matter what, the bar for each year is plotted from the y axis minimal value.
Here is the data I use
Here is my code :
select=pd.read_excel("./writer.xlsx")
select=pd.DataFrame(select)
select["dte"]=pd.to_datetime(select.dte)
select["month_day"]=pd.DatetimeIndex(select.dte).strftime('%B %d')
select["month"]=pd.DatetimeIndex(select.dte).month
select["day"]=pd.DatetimeIndex(select.dte).day
gs=gridspec.GridSpec(2,2)
fig=plt.figure()
ax1=plt.subplot(gs[0,0])
ax2=plt.subplot(gs[0,1])
ax3=plt.subplot(gs[1,:])
###2 others graphs that works just fine
data=pd.DataFrame()
del select["res"],select["Seuil"],select["Seuil%"] #these don't matter for that graph
for year_ in list(set(select.dteYear)):
temp=select.loc[select["dteYear"]==year_]
temp2=temp.iloc[[0,-1]] #the beginning and ending of the phenomenon
data=pd.concat([data,temp2]).reset_index(drop=True)
data=data.sort_values(["month","day"])
ax3.bar(data["dteYear"],data["month_day"],tick_label=data["dteYear"])
plt.show()
If you have some clue to help me, I'd really appreciate, because I havn't found any model to make this type of graph.
thanks !
EDIT :
I tried something else :
height,bottom,x_position=[], [], []
for year_ in list(set(select.dteYear)):
temp=select.loc[select["dteYear"]==year_]
bottom.append(temp["month_day"].iloc[0])
height.append(temp["month_day"].iloc[-1])
x_position.append(year_)
temp2=temp.iloc[[0,-1]]
data=pd.concat([data,temp2]).reset_index(drop=True)
ax3.bar(x=x_position,height=height,bottom=bottom,tick_label=x_position)
I got this error :
Traceback (most recent call last):
File "C:\Users\E31\Documents\cours\stage_dossier\projet_python\tool_etiage\test.py", line 103, in <module>
ax3.bar(x=x_position,height=height,bottom=bottom,tick_label=x_position)
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\__init__.py", line 1352, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 2357, in bar
r = mpatches.Rectangle(
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\patches.py", line 752, in __init__
super().__init__(**kwargs)
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\patches.py", line 101, in __init__
self.set_linewidth(linewidth)
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\patches.py", line 406, in set_linewidth
self._linewidth = float(w)
TypeError: only size-1 arrays can be converted to Python scalars
To make a bar graph that shows a difference between dates you should start by getting your data into a nice format in the dataframe where it is easy to access the bottom and top values of the bar for each year you are plotting. After this you can simply plot the bars and indicate the 'bottom' parameter. The hardest part in your case may be specifying the datetime differences correctly. I added a x tick locator and y tick formatter for the datetimes.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.dates as mdates
# make function that returns a random datetime
# between a start and stop date
def random_date(start, stop):
days = (stop - start).days
rand = np.random.randint(days)
return start + pd.Timedelta(rand, unit='days')
# simulate poster's data
T1 = pd.to_datetime('July 1 2021')
T2 = pd.to_datetime('August 1 2021')
T3 = pd.to_datetime('November 1 2021')
df = pd.DataFrame({
'year' : np.random.choice(np.arange(1969, 2020), size=15, replace=False),
'bottom' : [random_date(T1, T2) for x in range(15)],
'top' : [random_date(T2, T3) for x in range(15)],
}).sort_values(by='year').set_index('year')
# define fig/ax and figsize
fig, ax = plt.subplots(figsize=(16,8))
# plot data
ax.bar(
x = df.index,
height = (df.top - df.bottom),
bottom = df.bottom,
color = '#9e7711'
)
# add x_locator (every 2 years), y tick datetime formatter, grid
# hide top/right spines, and rotate the x ticks for readability
x_locator = ax.xaxis.set_major_locator(mpl.ticker.MultipleLocator(2))
y_formatter = ax.yaxis.set_major_formatter(mdates.DateFormatter('%d %b'))
tick_params = ax.tick_params(axis='x', rotation=45)
grid = ax.grid(axis='y', dashes=(8,3), alpha=0.3, color='gray')
hide_spines = [ax.spines[s].set_visible(False) for s in ['top','right']]

How to transform the data and calculate the TFIDF value?

My data format is:
datas = {[1,2,4,6,7],[2,3],[5,6,8,3,5],[2],[93,23,4,5,11,3,5,2],...}
Each element in datas is a sentence ,and each number is a word.I want to get the TFIDF value for each number. How to do it with sklearn or other ways?
My code:
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
datas = {[1,2,4,6,7],[2,3],[5,6,8,3,5],[2],[93,23,4,5,11,3,5,2]}
vectorizer=CountVectorizer()
transformer = TfidfTransformer()
tfidf = transformer.fit_transform(vectorizer.fit_transform(datas))
print(tfidf)
My code doesn't work.Error:
Traceback (most recent call last): File
"C:/Users/zhuowei/Desktop/OpenNE-master/OpenNE-
master/src/openne/buildTree.py", line 103, in <module>
X = vectorizer.fit_transform(datas) File
"C:\Users\zhuowei\Anaconda3\lib\site-
packages\sklearn\feature_extraction\text.py", line 869, in fit_transform
self.fixed_vocabulary_) File "C:\Users\zhuowei\Anaconda3\lib\site-
packages\sklearn\feature_extraction\text.py", line 792, in _count_vocab
for feature in analyze(doc): File
"C:\Users\zhuowei\Anaconda3\lib\site-
packages\sklearn\feature_extraction\text.py", line 266, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words) File
"C:\Users\zhuowei\Anaconda3\lib\site-
packages\sklearn\feature_extraction\text.py", line 232, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: 'int' object has no attribute 'lower'
You are using CountVectorizer which requires an iterable of strings. Something like:
datas = ['First sentence',
'Second sentence', ...
...
'Yet another sentence']
But your data is a list of lists, which is why the error occurs. You need to make the inner lists as strings for the CountVectorizer to work. You can do this:
datas = [' '.join(map(str, x)) for x in datas]
This will result in datas like this:
['1 2 4 6 7', '2 3', '5 6 8 3 5', '2', '93 23 4 5 11 3 5 2']
Now this form is consumable by CountVectorizer. But even then you will not get proper results, because of the default token_pattern in CountVectorizer:
token_pattern : ’(?u)\b\w\w+\b’
string Regular expression denoting what constitutes a
“token”, only used if analyzer == 'word'. The default regexp select
tokens of 2 or more alphanumeric characters (punctuation is completely
ignored and always treated as a token separator)
In order for it to consider your numbers as words, you will need to change it so that it can accept single letters as words by doing this:
vectorizer = CountVectorizer(token_pattern=r"(?u)\b\w+\b")
Then it should work. But now your numbers are changed into strings

Stop x-axis labels from shrinking the plot in Matplotlib?

I'm trying to make a bar graph with the following code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
test = {'names':['a','b','abcdefghijklmnopqrstuvwxyz123456789012345678901234567890'], 'values':[1,2,3]}
df = pd.DataFrame(test)
plt.rcParams['figure.autolayout'] = False
ax = sns.barplot(x='names', y='values', data=df)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
plt.show()
But I get the following error because the long value in 'names' as a label on the x-axis is making the image shrink until the bottom is above the top.
Traceback (most recent call last):
File "C:/Users/Adam/.PyCharm2018.2/config/scratches/scratch.py", line 11, in <module>
plt.show()
File "C:\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 253, in show
return _show(*args, **kw)
File "C:\Program Files\JetBrains\PyCharm 2018.2.3\helpers\pycharm_matplotlib_backend\backend_interagg.py", line 25, in __call__
manager.show(**kwargs)
File "C:\Program Files\JetBrains\PyCharm 2018.2.3\helpers\pycharm_matplotlib_backend\backend_interagg.py", line 107, in show
self.canvas.show()
File "C:\Program Files\JetBrains\PyCharm 2018.2.3\helpers\pycharm_matplotlib_backend\backend_interagg.py", line 62, in show
self.figure.tight_layout()
File "C:\Anaconda3\lib\site-packages\matplotlib\figure.py", line 2276, in tight_layout
self.subplots_adjust(**kwargs)
File "C:\Anaconda3\lib\site-packages\matplotlib\figure.py", line 2088, in subplots_adjust
self.subplotpars.update(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\matplotlib\figure.py", line 245, in update
raise ValueError('bottom cannot be >= top')
ValueError: bottom cannot be >= top
Here is what it looks like if I reduce the length of that name slightly:
How can I get it to expand the figure to fit the label instead of shrinking the axes?
One workaround is to create the Axes instance yourself as axes, not as subplot. Then tight_layout() has no effect, even if it's called internally. You can then pass the Axes with the ax keyword to sns.barplot. The problem now is that if you call plt.show() the label may be cut off, but if you call savefig with bbox_inches='tight', the figure size will be extended to contain both the figure and all labels:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
test = {'names':['a','b','abcdefghijklmnopqrstuvwxyz123456789012345678901234567890'], 'values':[1,2,3]}
df = pd.DataFrame(test)
#plt.rcParams['figure.autolayout'] = False
ax = sns.barplot(x='names', y='values', data=df, ax=ax)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
#plt.show()
fig.savefig('long_label.png', bbox_inches='tight')
PROCLAIMER: I don't have pycharm, so there goes the assumption in this code, that matplotlib behaves the same with and without pycharm. Anyway, for me the outcome looks like this:
If you want this in an interactive backend I didn't find any other way than manually adjust the figure size. This is what I get using the qt5agg backend:
ax = sns.barplot(x='names', y='values', data=df)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
ax.figure.set_size_inches(5, 8) # manually adjust figure size
plt.tight_layout() # automatically adjust elements inside the figure
plt.show()
Note that pycharm's scientific mode might be doing some magic that prevents this to work so you might need to deactivate it or just run the script outside pycharm.

I'm a newbie in image processing getting an error "Don't know how to convert parameter 1"

I'm using opencv on windows. This error occurs only after detecting green colour (which obviously is the task of the code).
CODE:
import cv2
import numpy as np
from pynput.mouse import Button, Controller
import wx
mouse=Controller()
app=wx.App(False)
(sx,sy)=wx.GetDisplaySize()
(camx,camy)=(640,480)
lowerBound=np.array([33,80,40])
upperBound=np.array([102,255,255])
cam= cv2.VideoCapture(0)
kernelOpen=np.ones((5,5))
kernelClose=np.ones((20,20))
pinchFlag=0
while True:
ret, img=cam.read()
img=cv2.resize(img,(640,480))
#convert BGR to HSV
imgHSV= cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
# create the Mask
mask=cv2.inRange(imgHSV,lowerBound,upperBound)
#morphology
maskOpen=cv2.morphologyEx(mask,cv2.MORPH_OPEN,kernelOpen)
maskClose=cv2.morphologyEx(maskOpen,cv2.MORPH_CLOSE,kernelClose)
maskFinal=maskClose
if(len(conts)==2):
if(pinchFlag==1):
pinchFlag=0
mouse.release(Button.left)
x1,y1,w1,h1=cv2.boundingRect(conts[0])
x2,y2,w2,h2=cv2.boundingRect(conts[1])
cv2.rectangle(img,(x1,y1),(x1+w1,y1+h1),(255,0,0),2)
cv2.rectangle(img,(x2,y2),(x2+w2,y2+h2),(255,0,0),2)
cx1=int(x1+w1/2)
cy1=int(y1+h1/2)
cx2=int(x2+w2/2)
cy2=int(y2+h2/2)
cx=int((cx1+cx2)/2)
cy=int((cy1+cy2)/2)
cv2.line(img, (cx1,cy1),(cx2,cy2),(255,0,0),2)
cv2.circle(img, (cx,cy),2,(0,0,255),2)
mouseLoc=(sx-(cx*sx/camx), cy*sy/camy)
mouse.position=mouseLoc
while mouse.position!=mouseLoc:
pass
elif(len(conts)==1):
x,y,w,h=cv2.boundingRect(conts[0])
if(pinchFlag==0):
pinchFlag=1
mouse.press(Button.left)
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
cx=int(x+w/2)
cy=int(y+h/2)
cv2.circle(img,(cx,cy),int((w+h)/4),(0,0,255),2)
mouseLoc=(sx-(cx*sx/camx), cy*sy/camy)
mouse.position = mouseLoc
while mouse.position != mouseLoc:
pass
cv2.imshow("cam",img)
cv2.waitKey(5)
Error:
Traceback (most recent call last):
File "virtual_mouse.py", line 65, in <module>
mouse.position = mouseLoc
File "C:\Users\dell\Anaconda3\envs\kj\lib\site-packages\pynput\mouse\_base.py", line 65, in position
self._position_set(pos)
File "C:\Users\dell\Anaconda3\envs\kj\lib\site-packages\pynput\mouse\_win32.py", line 66, in _position_set
self.__SetCursorPos(*pos)
ctypes.ArgumentError: argument 1: <class 'TypeError'>: Don't know how to convert parameter 1
If you're using openCV3, then the formula mouseLoc = (sx-(cx*sx/camx), cy*sy/camy) returns float values which are not shown in the iPython console. So you have to convert those into integer values and this will surely work out. Hence do the following changes:
mouseLoc = ( int(sx-(cx*sx/camx)) , int(cy*sy/camy) )

Exception in `transform_non_affine` with log axis

I'm getting a weird error when I try to use axes.transData when plotting on a log scale. Minimal code to reproduce this error:
#!/usr/bin/env python3
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
fig = Figure(figsize=(8,6))
canvas = FigureCanvas(fig)
ax = fig.add_subplot(1,1,1)
ax.plot(range(10))
ax.set_yscale('log') # <--- works fine without this line
print(ax.transData.transform((1,1))) # <--- exception thrown here
canvas.print_figure('test.pdf')
The stack trace is as follows:
File "/usr/local/lib/python3.3/site-packages/matplotlib-1.3.1-py3.3-linux-x86_64.egg/matplotlib/transforms.py", line 1273, in transform
return self.transform_affine(self.transform_non_affine(values))
File "/usr/local/lib/python3.3/site-packages/matplotlib-1.3.1-py3.3-linux-x86_64.egg/matplotlib/transforms.py", line 2217, in transform_non_affine
return self._a.transform_non_affine(points)
File "/usr/local/lib/python3.3/site-packages/matplotlib-1.3.1-py3.3-linux-x86_64.egg/matplotlib/transforms.py", line 2002, in transform_non_affine
x_points = x.transform_non_affine(points)[:, 0:1]
TypeError: tuple indices must be integers, not tuple
If I comment out the set_yscale('log') it runs fine. Does anyone know why this transform doesn't work?
Not completely satisfying, but I found a workaround. The issue seems to be related to the 1 dimensional array input to transform. Oddly it works if I use this:
ax.transData.transform(pts[None,:])
In other words, I have to reshape the array make it 2 dimensional.

Resources