Scaling a seaborn heatmap .png output file - python-3.x

I have a program that, in a nutshell, connects to a Cisco wireless controller and gathers data about the number of clients per access point. It runs at 'x' intervals at 'y' time between intervals.
The program works fine.
NOTE Both output files presented below show nine passes at 15 seconds between each pass. All you really care about is that I have 9 columns (one per pass) and the rows are the AP's and their connected clients.
My issue is this: when I run it against a small client (93 access points) the output looks exactly the way I want it to:
But when I run it against another client (1840 access points) the output looks like this:
Here is the relevant portion of my program:
df = pd.DataFrame(e, index=index, columns=cols)
df = df.transpose()
my_dpi = 96
sns.set(font_scale=2)
# plt.figure(figsize=(13, 91))
plt.figure(figsize=(2016 / my_dpi, 9120 / my_dpi), dpi=my_dpi)
sns.heatmap(df, cmap='RdYlGn_r', linewidths=0.5, annot=True, annot_kws={"size": 20})
plt.savefig('d:\\python\\projects\\clients_per_ap\\ac.png')
plt.show()
I tried changing 9120 to 912000, but I get an error stating that the value has to be less than 2^16. I tried 65535, but the program fails with a memory error. I tried 54720 and that works -- 54720 produced the output you see here as the second image, but it is unusable.
How can I scale my output file for the client with 1840 AP's to look like the output file for the client wit 93 AP's? Basically I would like the same (or very close) font and width, just 1840 rows long versus 93.

Have you tried outputting the file in vector format? .PDF works well usually. Sure the text will look big when you zoom in, but that's probably your best bet for working with output that is too big to put into memory. Then you can just use any classic PDF to PNG conversion if you just need that for some reason.

Related

Is it possible to modify the contents of a text object using Python?

I have a dictionary file called “labels” that contains text objects.
Screen capture of file
When I display the contents of this file, I get the following:
{'175.123.98.240': Text(-0.15349206308126684, -0.6696533109609498, '175.123.98.240'),
'54.66.152.105': Text(-1.0, -0.5455880938500245, '54.66.152.105'),
'62.97.116.82': Text(0.948676253595717, 0.6530664635187481, '62.97.116.82'),
'24.73.75.234': Text(0.849485905682265, -0.778703553136851, '24.73.75.234'),
'1.192.128.23': Text(0.2883091762715677, -0.03432011446968225, '1.192.128.23'),
'183.82.9.19': Text(-0.8855214994079628, 0.7201660238351776, '183.82.9.19'),
'14.63.160.219': Text(-0.047457773060320695, 0.655032585063581, '14.63.160.219')}
I want to change the IP address in the text object portion such that the file looks like this:
{'175.123.98.240': Text(-0.15349206308126684, -0.6696533109609498, 'xxx.123.98.240'),
'54.66.152.105': Text(-1.0, -0.5455880938500245, 'xxx.66.152.105'),
'62.97.116.82': Text(0.948676253595717, 0.6530664635187481, 'xxx.97.116.82'),
'24.73.75.234': Text(0.849485905682265, -0.778703553136851, 'xxx.73.75.234'),
'1.192.128.23': Text(0.2883091762715677, -0.03432011446968225, 'xxx.192.128.23'),
'183.82.9.19': Text(-0.8855214994079628, 0.7201660238351776, 'xxx.82.9.19'),
'14.63.160.219': Text(-0.047457773060320695, 0.655032585063581, 'xxx.63.160.219')}
This file is used for printing labels on a networkx graph.
I have a couple of questions.
Can the contents of a text object be modified?
If so, can it be changed without iterating through the file since the number of changes could range from 3 to 6,000, depending on what I am graphing?
How would I do it?
I did consider changing the IP address before I created my node and edge files but that resulted in separate IP address being clustered incorrectly. For example: 173.6.48.24 and 1.6.48.24 would both be converted to xxx.6.48.24.
Changing the IP address at the time of printing the labels seems like the only sensible method.
I am hoping someone could point me in the right direction. I have never dealt with text objects and I am out of my depth on this one.
Thanks
Additional information
The original data set is a list of IP addresses that have attack several honeypots I am running. I have taken the data and catalogued the data based on certain attack criteria.
The data that I showed was just one of the small attack networks. The label file was generated using the code:
labels = nx.draw_networkx_labels(compX, pos=pos_df)
Where compX is the file containing the data to be graphed and pos_df is the layout of the graph. In this case, I used nx.spring_layout().
I can display the contents of the label file using:
for k,v in labels.items():
print(v)
However, “v” contains the text object, which I do not seem to be able to work with. The content of “v” is a follows:
Text(-0.15349206308126684, -0.6696533109609498, '175.123.98.240')
Text(-1.0, -0.5455880938500245, '54.66.152.105')
Text(0.948676253595717, 0.6530664635187481, '62.97.116.82')
Text(0.849485905682265, -0.778703553136851, '24.73.75.234')
Text(0.2883091762715677, -0.03432011446968225, '1.192.128.23')
Text(-0.8855214994079628, 0.7201660238351776, '183.82.9.19')
Text(-0.047457773060320695, 0.655032585063581, '14.63.160.219')
This is where I am stuck. I do not seem to be able to come up with any code that does not return some kind of “'Text' object has no attribute xxxx”.
As for replacing the first octet, I have the following code that works on a dataframe and I have just been experimenting to see if I can adapt it but so far, no luck:
df[column_ID] = df[column_ID].apply(lambda x: "xxx."+".".join(x.split('.')[1:4])) # Replace First octet
As I said, I would prefer not to iterate through the file. This cluster has seven entries; others can contain up to 6,000 nodes – granted the graph looks like a hairball with this many nodes, but most are between 3 and 25 nodes. I have a total of 60 clusters and as I collect more information, this number will rise.
I found a solution to replacing text inside a text object:
1) Convert text object to string
2) Find the position to be changed and make the change
3) Use set_text() to make the change to the text object
Example code
# Anonymize Source IP address
for k,v in labels.items():
a = str(v)
a = a[a.find(", '"):]
a = 'xxx' + a[a.find("."):][:-2]
v.set_text(a)

gmplot Marker does not work after it marks 256 points

I am trying to mark a bunch of points on the map with gmplot and observed that after a certain point it stops marking and wipes out all the previously marked points. I debugged the gmplot.py module and saw that when the length of points array exceeds 256 this is happening without giving any error and warning.
self.points = [] on gmplot.py
Since I am very new to Python and OOPs concept, is there a way to override this and mark more than 256 points?
Are you using gmplot.GoogleMapPlotter.Scatter or gmplot.GoogleMapPlotter.Marker. I used either and was able to get 465 points for a project that I was working on. Is it possible it is an API key issue for you?
partial snippet of my code
import gmplot
import pandas as pd
# df is the database with Lat, Lon and formataddress columns
# change to list, not sure you need to do this. I think you can cycle through
# directly using iterrows. I have not tried that though
latcollection=df['Lat'].tolist()
loncollection=df['Lon'].tolist()
addcollection=df['formataddress'].tolist()
# center map with the first co-ordinates
gmaps2 = gmplot.GoogleMapPlotter(latcollection[0],loncollection[0],13,apikey='yourKey')
for i in range(len(latcollection)):
gmaps2.marker(latcollection[i],loncollection[i],color='#FF0000',c=None,title=str(i)+' '+ addcollection[i])
gmaps2.draw(newdir + r'\laplot_marker_full.html')
I could hover over the 465th point since I knew approximately where it was and I was able to get the title with str(464) <formataddress(464)>, since my array is indexed from 0
Make sure you check the GitHub site to modify your gmplot file, in case you are working with windows.

How to display more than 10 images in Tensorboard?

I noticed that it doesn't matter how many image I save to the tensorboard log file, tensorboard will only ever show 10 of them (per tag).
How can we increase the number of images or at least select which ones are displayed?
To reproduce what I mean run following MCVE:
import torch
from torch.utils.tensorboard import SummaryWriter
tb = SummaryWriter(comment="test")
for k in range(100):
# create an image with some funny pattern
b = [n for (n, c) in enumerate(bin(k)) if c == '1']
img = torch.zeros((1,10,10))
img[0, b, :] = 0.5
img =img + img.permute([0, 2, 1])
# add the image to the tensorboard file
tb.add_image(tag="test", img_tensor=img, global_step=k)
This creates a folder runs in which the data is saved. From the same folder execute tensorboard --logdir runs, open the browser and go to localhost:6006 (or replace 6006 with whatever port tensorboard happens to display after starting it). Then go to the tab called "images" and move the slider above the grayscale image.
In my case it only displayed the images from steps
k = 3, 20, 24, 32, 37, 49, 52, 53, 67, 78
which isn't even an nice even spacing, but looks pretty random. I'd prefer to have
see more than just 10 of the images I saved, and
have a more even spacing of number of steps between each image displayed.
How can I achieve this?
EDIT: I just found the option --samples_per_plugin and tried tensorboard --logdir runs --samples_per_plugin "images=100". This indeed increased the number of images, but it only showed the images from steps k = 0,1,2,3....,78, but none from above 78.
You probably have to wait a little bit longer to wait for all the data to be loaded, but this is indeed the correct solution, see --help:
--samples_per_plugin: An optional comma separated list of plugin_name=num_samples pairs to explicitly specify how many samples
to keep per tag for that plugin. For unspecified plugins, TensorBoard
randomly downsamples logged summaries to reasonable values to prevent
out-of-memory errors for long running jobs. This flag allows fine
control over that downsampling. Note that 0 means keep all samples of
that type. For instance, "scalars=500,images=0" keeps 500 scalars and
all images. Most users should not need to set this flag. (default: '')
Regarding the random samples: This is also true, there is some sort of randomness to it, from the the FAQ:
Is my data being downsampled? Am I really seeing all the data?
TensorBoard uses reservoir sampling to downsample your data so that it
can be loaded into RAM. You can modify the number of elements it will
keep per tag in tensorboard/backend/application.py.

We have many mainframe files which are in EBCDIC format, is there a way in Python to parse or convert the mainframe file into csv file or text file?

I need to read the records from mainframe file and apply the some filters on record values.
So I am looking for a solution to convert the mainframe file to csv or text or Excel workbook so that I can easily perform the operations on the file.
I also need to validate the records count.
Who said anything about EBCDIC? The OP didn't.
If it is all text then FTP'ing with EBCDIC to ASCII translation is doable, including within Python.
If not then either:
The extraction and conversion to CSV needs to happen on z/OS. Perhaps with a COBOL program. Then the CSV can be FTP'ed down with
or
The data has to be FTP'ed BINARY and then parsed and bits of it translated.
But, as so often is the case, we need more information.
I was recently processing the hardcopy log and wanted to break the record apart. I used python to do this as the record was effectively a fixed position record with different data items at fixed locations in the record. In my case the entire record was text but one could easily apply this technique to convert various colums to an appropriate type.
Here is a sample record. I added a few lines to help visualize the data offsets used in the code to access the data:
1 2 3 4 5 6 7 8 9
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
N 4000000 PROD 19114 06:27:04.07 JOB02679 00000090 $HASP373 PWUB02#C STARTED - INIT 17
Note the fixed column positions for the various items and how they are referenced by position. Using this technique you could process the file and create a CSV with the output you want for processing in Excel.
For my case I used Python 3.
def processBaseMessage(self, message):
self.command = message[1]
self.routing = list(message[2:9])
self.routingCodes = [] # These are routing codes extracted from the system log.
self.sysname = message[10:18]
self.date = message[19:24]
self.time = message[25:36]
self.ident = message[37:45]
self.msgflags = message[46:54]
self.msg = [ message[56:] ]
You can then format into the form you need for further processing. There are other ways to process mainframe data but based on the question this approach should suit your needs but there are many variations.

Maximize figures before saving

The question about how to do maximize a window before saving has been asked several times and has several questions (still no one is portable, though), How to maximize a plt.show() window using Python
and How do you change the size of figures drawn with matplotlib?
I created a small function to maximize a figure window before saving the plots. It works with QT5Agg backend.
import matplotlib.pyplot as plt
def maximize_figure_window(figures=None, tight=True):
"""
Maximize all the open figure windows or the ones indicated in figures
Parameters
----------
figures: (list) figure numbers
tight : (bool) if True, applies the tight_layout option
:return:
"""
if figures is None:
figures = plt.get_fignums()
for fig in figures:
plt.figure(fig)
manager = plt.get_current_fig_manager()
manager.window.showMaximized()
if tight is True:
plt.tight_layout()
Problems:
I have to wait for the windows to be actually maximized before using the plt.savefig() command, otherwise it is saved with as not maximized. This is a problem if I simply want to use the above function in a script
(minor problems:)
2. I have to use the above function twice in order to get the tight_layout option working, i.e. the first time tight=True has no effect.
The solution is not portable. Of course I can add all the possible backend I might use, but that's kind of ugly.
Questions:
how to make the script wait for the windows to be maximized? I don't want to use time.sleep(tot_seconds) because tot_seconds would be kind of arbitrary and makes the function even less portable
how to solve problem 2 ? I guess it is related to problem 1.
is there a portable solution to "maximize all the open windows" problem?
-- Edit --
For problem 3. #DavidG suggestion sounds good. I use tkinter to automatically get width and height, convert them to inches, and use them in fig.set_size_inches or directly during the figure creation via fig = plt.figure(figsize=(width, height)).
So a more portable solution is, for example.
import tkinter as tk
import matplotlib.pyplot as plt
def maximize_figure(figure=None):
root = tk.Tk()
width = root.winfo_screenmmwidth() / 25.4
height = root.winfo_screenmmheight() / 25.4
if figure is not None:
plt.figure(figure).set_size_inches(width, height)
return width, height
where I allow the figure to be None so that I can use the function to just retrieve width and height and use them later.
Problem 1 is still there, though.
I use maximize_figure() in a plot function that I created (let's say my_plot_func()) but still the saved figure doesn't have the right dimensions when saved on file.
I also tried with time.sleep(5) in my_plot_func() right after the figure creation. Not working.
It works only if a manually run in the console maximize_figure() and then run my_plot_func(figure=maximized_figure) with the figure already maximized. Which means that dimension calculation and saving parameters are correct.
It does not work if I run in the console maximize_figure() and my_plot_func(figure=maximized_figure) altogether, i.e. with one call the the console! I really don't get why.
I also tried with a non-interactive backend like 'Agg', so that the figure doesn't get actually created on screen. Not working (wrong dimensions) no matter if I call the functions altogether or one after the other.
To summarize and clarify (problem 1):
by running these two pieces of code in console, figure gets saved correctly.
plt.close('all')
plt.switch_backend('Qt5Agg')
fig = plt.figure()
w, h = maximize_figure(fig.number)
followed by:
my_plot_func(out_file='filename.eps', figure=fig.number)
by running them together (like it would be in a script) figure is not saved correctly.
plt.close('all')
plt.switch_backend('Qt5Agg')
fig = plt.figure()
w, h = maximize_figure(fig.number)
my_plot_func(out_file='filename.eps', figure=fig.number)
Using
plt.switch_backend('Agg')
instead of
plt.switch_backend('Qt5Agg')
it does not work in both cases.

Resources