I need to use the "description" as my chart or plot title and I cannot find a way to do this in my internet searches. The output from the .nc file variable that has the "description" that I need looks like this:
<class 'netCDF4._netCDF4.Variable'>
float64 M(lat, on)
_FillValue: nan
long_name: Wind Speed at 100m
description: Anomaly for June 2021 vs the previous 30 years
unlimited dimensions:
current shape = (2920, 7200)
My code looks like this:
# -*- coding: utf-8 -*-
#author: U321103
from sys import exit
import netCDF4 as nc4
from netCDF4 import Dataset
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
#from mpl_toolkits.basemap import Basemap, cm
import datetime
from datetime import datetime
import pandas as pd
import xarray as xr
import bottleneck as bn
import cartopy.crs as ccrs
from mpl_toolkits.basemap import Basemap
import os
os.environ["PROJ_LIB"] = 'C:\\Users\\Yury\\anaconda3\\Library\\share'
# -----------------------------------------------------------------------------------------------------------
# -----------------------------------------------------------------------------------------------------------
#%matplotlib inline
#The easiest way to read the data is:
path = "//porfiler03/gtdshare/VORTEX/ANOMALY_FILES/anomaly.M.2021.06.vs30y/world.nc"
# Open the NetCDF file
fh = Dataset(path)
#read variables in fh
for var in fh.variables.values():
# Get the 100m wind speed
wind100 = fh['M'][:]
#wind100_units = fh['M'].units
# Get the latitude and longitude points
lats = fh.variables['lat'][:]
lons = fh.variables['lon'][:]
# Get some parameters for the Stereographic Projection
lon_0 = lons.mean()
lat_0 = lats.mean()
#m = Basemap(width=25000000,height=12000000,
# resolution='l',projection='lcc',\
# lat_ts=50,lat_0=lat_0,lon_0=lon_0)
m = Basemap(projection='merc',llcrnrlat=-40,urcrnrlat=60,\
# help on coordinates: https://matplotlib.org/basemap/users/merc.html
# Because our lon and lat variables are 1D,
# use meshgrid to create 2D arrays
# Not necessary if coordinates are already in 2D arrays.
lon, lat = np.meshgrid(lons, lats)
xi, yi = m(lon, lat)
# Plot Data
cs = m.pcolor(xi,yi,np.squeeze(wind100))
# Add Grid Lines
m.drawparallels(np.arange(-80., 81., 40.), labels=[1,0,0,0], fontsize=10)
m.drawmeridians(np.arange(-180., 181., 40.), labels=[0,0,0,1], fontsize=10)
# Add Coastlines, States, and Country Boundaries
# Add Colorbar
cbar = m.colorbar(cs, location='bottom', pad="10%")
# Add Title
plt.title(' ')
So, what I need exactly is "Anomaly for June 2021 vs the previous 30 years" to add to the plot below in the line with plt.title() - thank you!
You should add this line of code wind100_description = fh['M'].description somewhere before fh.close(). Then simply do plt.title(wind100_description) instead of plt.title(' '). Also, it's a good practice to remove the imports you don't need, of which you have quite a few :)
I have working code that is utilizing dbscan to find tight groups of sparse spatial data imported with pd.read_csv.
I am maintaining the original spatial data locations and would like to annotate the labels returned by dbscan for each data point to the original dataframe and then write a csv with the same information.
So the code below is doing exactly what I would expect it to at this point, I would just like to extend it to import the label for each row in the original dataframe.
import argparse
import string
import os, subprocess
import pathlib
import glob
import gzip
import re
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from sklearn.cluster import DBSCAN
X = pd.read_csv(tmp_csv_name)
X = X.drop('Name', axis = 1)
X = X.drop('Type', axis = 1)
X = X.drop('SomeValue', axis = 1)
# only columns 'x' and 'y' now remain
db=DBSCAN(eps=EPS, min_samples=minSamples, metric='euclidean', algorithm='auto', leaf_size=30).fit(X)
labels = def_inst_dbsc.labels_
unique_labels = set(labels)
# maxX , maxY are manual inputs temporarily
while sizeX > 16 or sizeY > 16 :
sizeX=sizeX*0.8 ; sizeY=sizeY*0.8
fig, ax = plt.subplots(figsize=(sizeX,sizeY))
plt.scatter(X['x'], X['y'], c=colors, marker="o", picker=True)
# hackX , hackY are manual inputs temporarily
# which represent the boundaries defined in the original dataset
poly = patches.Polygon(xy=list(zip(hackX,hackY)), fill=False)
In Jupyter Notebooks I read in a dataframe and create several plots with Pandas / Bokeh.
While creating one of the latter I get an error.
Search for similar problems said, that there might be somewhere above in the script something like
plt.title = "Title"
which overwrites the method. But this is not the case for me. I have nothing similar in the code above -exept in the figure parameters. Here the Bokeh documentation describes to set a figure title like I used it.
Using the part of the code that leads the the error in the complete notebook in a stand-alone script only does NOT lead to the error. So, also in my case the problem might have something to do with my code above. But maybe some of you has an idea when seeing this..(?)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bokeh.plotting import figure, show, output_notebook, ColumnDataSource
from bokeh.io import output_notebook
from bokeh.layouts import column, gridplot
from bokeh.models import Label, Title
from bokeh.models import Div
data = df
# Title of the overall plot
abovetitle = ("This should be the overall title of all graphs")
s1 = figure(width = 250, plot_height = 250, title="Graph 1", x_axis_label = "axis title 1", y_axis_label = 'µs')
s1.line(x, y, width=1, color="black", alpha=1, source = data)
# s1.title.text = "Title With Options" # this is a instead-off 'title=' test, but does not solve the problem
s2 = figure(width = 250, plot_height = 250, title="Graph 2", x_axis_label = "axis title 2, y_axis_label = 'µs')
s2.line(x, y, width=1, color="blue", alpha=1, source = data)
#s2.title.text = "Title With Options" # this is a instead-off 'title=' test, but does not solve the problem
# plot graphs:
p = gridplot([[s1, s2]])
show(column(Div(text=abovetitle), p))
leads to the type error
TypeError Traceback (most recent call last)
<ipython-input-24-33e4828b986d> in <module>
31 # plot graphs:
32 p = gridplot([[s1, s2]])
---> 33 show(column(Div(text=title), p))
TypeError: 'str' object is not callable
import matplotlib.pyplot as plt
does not solve the problem. Hence, recalling
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bokeh.plotting import figure, show, output_notebook, ColumnDataSource
from bokeh.io import output_notebook
from bokeh.layouts import column, gridplot
from bokeh.models import Label, Title
from bokeh.models import Div
solves the problem. Any further idea what might cause this error?
In the mean time I got a very useful hint: In one of the prior cells I accidentially used a Bokeh API function name as variable name and overwrote the function. If someone faces a comparable problem have a look at your variable naming. Maybe there happend the same accident... ;-)
# Define column names of XData binary part
header = ["Col1","Col2","Col3"]
# Split XData in single, space separated columns
x_df = selected_df.XData.str.split(' ', expand=True)
x_df.drop(0, inplace=True, axis=1)
x_df.columns = header
# Binary XData to integer
for column in x_df: # DONT DO THAT!!!!! THIS OVERWRITES BOKEH API FUNCTION. EG. USE `col` INSTEAD OF `column`
x_df[column] = x_df[column].apply(int, base=16) # DONT DO THAT!!!!! THIS OVERWRITES BOKEH API FUNCTION. EG. USE `col` INSTEAD OF `column`
I am running this script to calculate and plot, the similarity between some documents.
# -*- coding: utf-8 -*-
import os
import codecs
import string, re
import nltk
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd
from pathlib import Path
from matplotlib import cm as cm
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import cosine_similarity
path = "C:\\Users\\user\\Desktop\\texts\\dataset"
text_files = os.listdir(path)
#print (text_files)
tfidf_vectorizer = TfidfVectorizer()
documents = [open(f, encoding="utf-8").read() for f in text_files if f.endswith('.txt')]
sparse_matrix = tfidf_vectorizer.fit_transform(documents)
#with open('C:\\Users\\user\\Desktop\\texts\\results\\pairwise_similarity2.csv', 'w') as f:
# for item in pairwise_similarity:
# f.write("%s\n" % item)
# f.write('\n')
labels = []
for f in text_files:
if f.endswith('.txt'):
pairwise_similarity = sparse_matrix * sparse_matrix.T
pairwise_similarity_array = pairwise_similarity.toarray()
fig, ax = plt.subplots(figsize=(20,20))
cax = ax.matshow(pairwise_similarity_array, interpolation='spline16')
plt.title('News articles similarity matrix')
plt.xticks(range(23), labels, rotation=90);
plt.yticks(range(23), labels);
fig.colorbar(cax, ticks=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])
Even though I' have created a labels list, I was wondering how can I access the index of documents to be able to associate a specific document with scores? This would also be helpful to track documents in other tasks as well. For example, I am also using the louvain community library to draw further assumption for the dataset but when trying to apply the labels list as labels it gives an error: AttributeError: 'list' object has no attribute 'items'
Here is the code and the output of community louvain
[![# load the karate club graph
G = nx.from_numpy_matrix(pairwise_similarity_array)
# compute the best partition
partition = community_louvain.best_partition(G)
modularity = community_louvain.modularity(partition, G)
# draw the graph
pos = nx.spring_layout(G)
# color the nodes according to their partition
cmap = cm.get_cmap('coolwarm', max(partition.values()) + 1)
nx.draw_networkx_nodes(G, pos, partition.keys(), node_size=100,
cmap=cmap, node_color=list(partition.values()))
nx.draw_networkx_edges(G, pos, alpha=0.5)
nx.draw_networkx_labels(G,pos, labels=doc_labels, font_size=12,font_family='sans-serif')
dendro = community_louvain.generate_dendrogram(G)][1]][1]
I have created below code in python for Power BI. This is not showing anything.
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset = pandas.DataFrame(Company, Target)
# dataset = dataset.drop_duplicates()
# Paste or type your script code here:
import matplotlib.pyplot as plt
from plotnine import *
from plotnine.data import mpg
(ggplot(dataset) # data
+ aes(x='Company') # variable
+ geom_bar(size=20)) # ype of plot
file contains values of echos w.r.t to lat/long, I have to plot complete range of echos over base map.
from netCDF4 import Dataset
import numpy as np
import pandas as pd
from google.colab import files
upload = files.upload()
my_example_nc_file = 'a.nc'
fh = Dataset(my_example_nc_file, mode='r')
lons = fh.variables['longitude'][:]
lats = fh.variables['latitude'][:]
ech= fh.variables['echos'][:]
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
%matplotlib inline
m = Basemap(width=5000000,height=3500000,
xi, yi = m(lons, lats)
#simple plot
#m.plot(xi, yi, 'co')
m.scatter(rge,yi, marker = 'o', color='r', zorder=5)
Current code execute below results.
enter image description here
I want to plot total echos with variation represented by colors as presented in below screen short
enter image description here