How to get final figure width (uselatex) - python-3.x

I am writing a latex document and using matplotlib for plots. I want to have the font and font size (9) of the captions of my latex document also for the plot axes and legend text.
Furthermore, I would like to fill out the \linewidth or \textwidth of my latex document, which is 369 pt. Now the matplotlib.pyplot.figure function accepts the input parameter figsize which should be in inches, so I duly specify it as 369/72 inches, 1/72 being the conversion factor from pt to inches.
Later I cut down excess white space by using the bbox_inches=tight and pad_inches=0 options of the savefig function.
The font and font size part works as intended. It looks exactly identical between the figure text and the caption text. However, I am still dissatisfied with the figure width.
Below is a minimal example of a figure I produce.
import matplotlib
import matplotlib.pyplot as plt
plt.rcdefaults()
plt.rcParams['font.size'] = '9'
plt.rcParams['figure.autolayout'] = False
matplotlib.rc('font', family='sans-serif', serif=['Palatino'])
matplotlib.rc('text', usetex=True)
params = {'text.latex.preamble': [
r'\usepackage[american]{babel}',
r'\usepackage{mathpazo}',
r'\usepackage{amsmath,amssymb,amsfonts,mathrsfs}',
r'\usepackage{textcomp}',
]}
plt.rcParams.update(params)
plt.rcParams['mathtext.default'] = 'regular'
plt.rcParams['legend.handlelength'] = 1
delta_adjust = 0
plt.rcParams['figure.subplot.bottom'] = delta_adjust
plt.rcParams['figure.subplot.top'] = 1 - delta_adjust
plt.rcParams['figure.subplot.left'] = delta_adjust
plt.rcParams['figure.subplot.right'] = 1 - delta_adjust
plt.rcParams['figure.subplot.hspace'] = 0.55
plt.rcParams['figure.subplot.wspace'] = 0.55
default_figsize=(369/72, 369/72)
fig = plt.figure(figsize=default_figsize)
sp1 = plt.subplot(3,3,1)
sp2 = plt.subplot(3,3,2)
sp3 = plt.subplot(3,3,3)
for sp in sp1, sp2, sp3:
sp.set_title('Title')
sp.set_xlabel('Xlabel')
sp.set_ylabel('Ylabel')
twin = sp3.twinx()
twin.set_ylabel('Ylabel')
fig.set_size_inches(default_figsize)
fig.savefig('./example.pdf', transparent=False, bbox_inches='tight', pad_inches=0, ending='.pdf')
This is the result of the above code. The figure has a width of 429.356 pt instead of the desired 369 pt. When I increase the delta_adjust parameter in the code, I get smaller pdf widths.
[philipp#desktop scripts]$ python minimal_example.py
[philipp#desktop scripts]$ pdfinfo example.pdf
Creator: matplotlib 3.1.2, http://matplotlib.org
Producer: matplotlib pdf backend 3.1.2
CreationDate: Thu Jan 13 11:41:13 2022 CET
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 1
Encrypted: no
Page size: 429.356 x 130.412 pts
Page rot: 0
File size: 102863 bytes
Optimized: no
PDF version: 1.4
When I scale the figsize parameter of the python code from 369 pt to 369*369/429 pt, I end up with a 386 pt pdf. I do not want to use a trial and error strategy to find the correct parameter. As a last resort, I could write an iterative program as a savefig routine but I would prefer to avoid this. For reference, here is the output of the program converted to png: image
In summary, I am looking for help on how to set the figure width reliably.
I am on Ubuntu 20.04, python 3.8, matplotlib 3.1.2, and I use the TkAgg backend which is the default.
Any help is appreciated.

Only after posting this question did this website recommend me the following question: How to get figure size and fontsize right for PDFs exported from matplotlib?
It turns out that bbox_inches=tight messes with the figure size.
I removed this option and set delta_adjust = 0.1 in the code above.
Now the figure has the expected size of exactly 369x369 pt.
Most of it is whitespace, which I can remove using the pdfcrop command line utility.
The current script looks like this.
import os
import matplotlib
import matplotlib.pyplot as plt
plt.rcdefaults()
plt.rcParams['font.size'] = '9'
plt.rcParams['figure.autolayout'] = False
matplotlib.rc('font', family='sans-serif', serif=['Palatino'])
matplotlib.rc('text', usetex=True)
params = {'text.latex.preamble': [
r'\usepackage[american]{babel}',
r'\usepackage{mathpazo}',
r'\usepackage{amsmath,amssymb,amsfonts,mathrsfs}',
r'\usepackage{textcomp}',
]}
plt.rcParams.update(params)
plt.rcParams['mathtext.default'] = 'regular'
plt.rcParams['legend.handlelength'] = 1
delta_adjust_h = 0.1
delta_adjust_v = 0.1
plt.rcParams['figure.subplot.bottom'] = delta_adjust_v
plt.rcParams['figure.subplot.top'] = 1 - delta_adjust_v
plt.rcParams['figure.subplot.left'] = delta_adjust_h
plt.rcParams['figure.subplot.right'] = 1 - delta_adjust_h
plt.rcParams['figure.subplot.hspace'] = 0.55
plt.rcParams['figure.subplot.wspace'] = 0.55
#factor = 369/429.356
factor = 1
default_figsize=(369/72*factor, 369/72*factor)
fig = plt.figure(figsize=default_figsize)
sp1 = plt.subplot(3,3,1)
sp2 = plt.subplot(3,3,2)
sp3 = plt.subplot(3,3,3)
for sp in sp1, sp2, sp3:
sp.set_title('Title')
sp.set_xlabel('Xlabel')
sp.set_ylabel('Ylabel')
twin = sp3.twinx()
twin.set_ylabel('Ylabel')
fig.set_size_inches(default_figsize)
#fig.savefig('./example.pdf', transparent=False, ending='.pdf', pad_inches=0)
# bbox_inches='tight', pad_inches=0, ending='.pdf')
fig.savefig('./example.pdf', transparent=True, pad_inches=0, ending='.pdf')
os.system('pdfcrop ./example.pdf ./example_cropped.pdf')
The output of pdfinfo is the following:
[philipp#desktop scripts]$ pdfinfo example_cropped.pdf
Creator: TeX
Producer: pdfTeX-1.40.20
CreationDate: Thu Jan 13 13:28:25 2022 CET
ModDate: Thu Jan 13 13:28:25 2022 CET
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 1
Encrypted: no
Page size: 364 x 112 pts
Page rot: 0
File size: 102853 bytes
Optimized: no
PDF version: 1.4
So it is still not perfect, as the figure width is slightly too small due to the cropping.
Also it is now not guaranteed that all the contents fit within the printed pdf, which previously was ensured by the bbox_inches option.
Nevertheless, this is an improvement as now the plot sizes can no longer exceed the latex \linewidth.
I may update this answer if I find a better solution.
Edit: There is a feature under development at matplotlib which would solve the problem: https://matplotlib.org/stable/tutorials/intermediate/constrainedlayout_guide.html
I tried it but it seems to work best when the subplots are all created at once, such as with plt.subplots. I will have to change all my scripts, because currently I add subplots one by one with plt.subplot. Moreover I need to set the vertical figure size explicitly instead of simply generating a square figure and cropping all the unused whitespace.

Related

using gnuplot to read in - not output - a png image, or many, in one session - as an "overlay"

I want to read multiple png files - which themselves were created with gnuplot (terminal png) - in order to achieve an "overlay" - that is, a number of functions plotted together one on top of the other, with no background. This apparently could be done with gnuplot in one session.
I found this idea from the Linux Gazette article "Plotting the spirograph equations with 'gnuplot' ", from 2006 :
https://linuxgazette.net/133/luana.html
I am stuck on a number of error messages (vide infra) :
line 0: Bad data on line 1 of file [...]
line 0: warning: using default binary record/array structure
line 0: Too many using specs for this style
Looking for solutions, I read in the help pages ( http://gnuplot.info/docs_5.5/loc7742.html ) that gnuplot can read png images :
plot 'file.png' binary filetype=png
... and I have looked into using pngcairo instead of png itself. I am using eog to view the .png images. Here is sample code which generates the error above, and more if adjusted :
set size ratio -1
set nokey
set noxtics
set noytics
set noborder
set parametric
i2p = {0,1}*2*pi
set terminal png
t0 = 0
t1 = 1
#---------------------------------------------
# plot first function in the gnuplot session :
#---------------------------------------------
test01(t) = exp(i2p*(2*t))
set output "solve_png_problem_15nov22a.png"
plot [t=t0:t1] 1*real(test01(t)),1*imag(test01(t)) lc 1
#---------------------------------------------------
# plot second function in the same gnuplot session :
#---------------------------------------------------
test02(t) = + 3*1.0**20 * exp(i2p*(-3*t+20/200. )) + 3*1.0**19 * exp(i2p* (2*t+20/200.))
set output "solve_png_problem_15nov22b.png"
plot [t=t0:t1] 1*real(test02(t)),1*imag(test02(t)) lc 2
#------------------------------------------------------------
# last plotting to apparently "overlay" the two plots above :
#------------------------------------------------------------
set terminal png size 600,600
set output "solve_png_problem_15nov22_overlay.png"
set noparametric
plot "solve_png_problem_15nov22a.png", "solve_png_problem_15nov22b.png"
.... the reduced sample code is generated from the awk script supplement to the article - see it for detail :
https://linuxgazette.net/133/misc/luana/spirolang.awk.txt
The functions are nontrivial so they were kept in tact, as the associated settings might be causing the problem. The individual images look ok, so I think the problem is in the last plot command.
I read in the help pages that gnuplot can read png images :
plot 'file.png' binary filetype=png
... and also filetype=auto, and I have looked into using pngcairo instead of png itself, with no progress ; I have read the results of Google searches for the error messages. I have read the help pages on terminal, png, image, binary, and so on. I was expecting gnuplot to simply recognize the file was a png image that gnuplot itself generated, using the png terminal. What actually results is the error"Too many using specs for this style". For this, I have tried moving the position of the "binary filetype=png" in the code, which give the error "line 0: Bad data on line 1 of file [...]". I have also tried using programs outside gnuplot, such as montage and composite (ImageMagick).
gnuplot version 5.4 patchlevel 2
Ubuntu 22.04
post-answer update:
TL;DR : use svg terminal.
I saved a lot of grief by simply using the svg terminal. The original work must have been published before gnuplot got the svg terminal. I still need to work svg into the original script - but svg will make it a lot easier.
Try this in GNUPLOT.
gnuplot<<EOF
set terminal png medium size 600,600 background rgb "white"
set size ratio -1
set nokey
set noxtics
set noytics
set noborder
set parametric
i2p = {0,1}*2*pi
t0 = 0
t1 = 1
#---------------------------------------------
# plot first function in the gnuplot session :
#---------------------------------------------
test01(t) = exp(i2p*(2*t))
set output "solve_png_problem_15nov22a.png"
plot [t=t0:t1] 1*real(test01(t)),1*imag(test01(t)) lc 1
#---------------------------------------------------
# plot second function in the same gnuplot session :
#---------------------------------------------------
test02(t) = + 3*1.0**20 * exp(i2p*(-3*t+20/200. )) + 3*1.0**19 * exp(i2p* (2*t+20/200.))
set output "solve_png_problem_15nov22b.png"
plot [t=t0:t1] 1*real(test02(t)),1*imag(test02(t)) lc 2
#------------------------------------------------------------
# last plotting to "overlay" the two plots above :
#------------------------------------------------------------
set output "solve_png_problem_15nov22_overlay.png"
plot \
[t=t0:t1] 1*real(test01(t)),1*imag(test01(t)) lc 1, \
[t=t0:t1] 1*real(test02(t)),1*imag(test02(t)) lc 2
EOF
First Result:
Second Result:
Combined Result:

Remove repeating values from X axis label in Altair

I am having trouble with Altair repeating X axis label values.
Data:
rule_abbreaviation flagged_claim bill_month
0 CONCIDPROC 1 Apr2022
1 CONTUSMAT1 1 Apr2022
2 COVID05 1 Jun2021
3 FILTROTUB2 1 Sep2021
4 MEPIARTRO1 1 Mar2022
#Code to generate Altair Bar Chart
bar = alt.Chart(Data).mark_bar().encode(
x=alt.X('flagged_claim:Q', axis=alt.Axis(title='Flagged Claims', format= ',.0f'), stack='zero'),
y=alt.Y('rule_abbreaviation:N', axis=alt.Axis(title='Component Abbreviation'), sort=alt.SortField(field=measure, order='descending')),
tooltip=[alt.Tooltip('max(ClaimRuleName):N', title='Claim Component'), alt.Tooltip('flagged_claim:Q', title='Flagged Claims', format= ',.0f')],
color=alt.Color('bill_month', legend=None)
).properties(width=485,
title = alt.TitleParams(text = 'Bottom Components',
font = 'Arial',
fontSize = 16,
color = '#000080',
)
).interactive()
X axis label generated by this chart contains repeated 0 and 1
Image of Visualization: https://i.stack.imgur.com/0XdWB.png
The reason this is happening is because you have format= ',.0f' which tells Altair to include 0 decimals in the axis labels. Remove it or change to 1f to see decimals in the labels. In general, a good way to troubleshoot problems like this is to remove part of your code at a time to identify which part is causing the unexpected behavior.
To reduce the number of ticks you can use alt.Axis(title='Flagged Claims', format='d', tickCount=1) or alt.Axis(title='Flagged Claims', format='d', values=[0, 1]). See also Changing Number of y-axis Ticks in Altair

Why gnuplot rounds data in column of vertical axis?

My old script worked fine years ago.
set terminal png background "#ffffff" enhanced fontscale 2.0 size 1800, 1400
set output 'delete.png'
w=1
x=1
z = 60
y=2
plot 'plot.in.tmp' using (column(x)/z):(column(y)) axis x1y1 with lines
exit gnuplot
reset
Now result in graph with only rounded integer points in y(vertical) axe. I dont understand why.
Example data in file:
0 -0,00 0,5 570,2 11,98 -0,121 0,000 9,6
5 -0,00 0,7 570,2 11,97 -0,002 0,012 13,2
10 -0,00 0,9 570,3 11,98 -0,004 -0,000 16,1
15 0,24 35,9 570,4 11,96 0,001 0,000 18,4
20 0,56 87,0 570,1 11,99 -0,001 -0,000 20,5
25 1,03 173,5 570,4 11,97 -0,000 0,000 23,2
30 1,61 296,4 570,3 11,96 0,002 0,000 12,4
35 2,17 422,6 570,2 11,68 0,004 0,000 8,8
40 2,81 571,6 570,2 11,37 0,010 0,001 7,5
45 3,52 752,3 570,3 11,26 0,015 0,000 7,1
50 3,97 905,0 570,2 11,69 0,075 0,006 7,4
55 4,36 1048,4 570,1 11,36 0,081 0,001 8,6
60 4,59 1156,8 570,2 11,22 0,087 0,001 10,7
Result graph:
Welcome to StackOverflow! Maybe the local setting of your system (or something in gnuplot) has changed?
The following works for me with your data.
Add a line
set decimalsign locale "german"
or
set decimalsign locale "french"
Check help decimalsign.
Syntax:
set decimalsign {<value> | locale {"<locale>"}}
Correct typesetting in most European countries requires:
set decimalsign ','
Please note: If you set an explicit string, this affects only numbers
that are printed using gnuplot's gprintf() formatting routine,
including axis tics. It does not affect the format expected for input
data, and it does not affect numbers printed with the sprintf()
formatting routine.
The answer given by theozh is correct, but it does not point out the unfortunate lack of standardization about how different operating systems report the current locale setting. For linux machines the locale strings are less human-friendly. For example instead of using something generic like "french", they subdivide into "fr_FR.UTF-8" "fr_BE.UTF-8" "fr_LU.UTF-8" etc to account for slight differences in the conventions used in France, Belgium, Luxembourg, etc.
I cannot tell you the exact set of locale descriptions on your machine, but here is what works for me on a linux machine:
set decimalsign locale "fr_FR.UTF-8"
w=1
x=1
z = 60
y=2
plot 'plot.in.tmp' using (column(x)/z):(column(y)) axis x1y1 with lines

svm train output file has less lines than that of the input file

I am currently building a binary classification model and have created an input file for svm-train (svm_input.txt). This input file has 453 lines, 4 No. features and 2 No. classes [0,1].
i.e
0 1:15.0 2:40.0 3:30.0 4:15.0
1 1:22.73 2:40.91 3:36.36 4:0.0
1 1:31.82 2:27.27 3:22.73 4:18.18
0 1:22.73 2:13.64 3:36.36 4:27.27
1 1:30.43 2:39.13 3:13.04 4:17.39 ......................
My problem is that when I count the number of lines in the output model generated by svm-train (svm_train_model.txt), this has 12 fewer lines than that of the input file. The line count here shows 450, although there are obviously also 9 lines at the beginning showing the various parameters generated
i.e.
svm_type c_svc
kernel_type rbf
gamma 1
nr_class 2
total_sv 441
rho -0.156449
label 0 1
nr_sv 228 213
SV
Therefore 12 lines in total from the original input of 453 have gone. I am new to svm and was hoping that someone could shed some light on why this might have happened?
Thanks in advance
Updated.........
I now believe that in generating the model, it has removed lines whereby the labels and all the parameters are exactly the same.
To explain............... My input is a set of miRNAs which have been classified as 1 and 0 depending on their involvement in a particular process or not (i.e 1=Yes & 0=No). The input file looks something like.......
0 1:22 2:30 3:14 4:16
1 1:26 2:15 3:17 4:25
0 1:22 2:30 3:14 4:16
Whereby, lines one and three are exactly the same and as a result will be removed from the output model. My question is then both why the output model would do this and how I can get around this (whilst using the same features)?
Whilst both SOME OF the labels and their corresponding feature values are identical within the input file, these are still different miRNAs.
NOTE: The Input file does not have a feature for miRNA name (and this would clearly show the differences in each line) however, in terms of the features used (i.e Nucleotide Percentage Content), some of the miRNAs do have exactly the same percentage content of A,U,G & C and as a result are viewed as duplicates and then removed from the output model as it obviously views them as duplicates even though they are not (hence there are less lines in the output model).
the format of the input file is:
Where:
Column 0 - label (i.e 1 or 0): 1=Yes & 0=No
Column 1 - Feature 1 = Percentage Content "A"
Column 2 - Feature 2 = Percentage Content "U"
Column 3 - Feature 3 = Percentage Content "G"
Column 4 - Feature 4 = Percentage Content "C"
The input file actually looks something like (See the very first two lines below), as they appear identical, however each line represents a different miRNA):
1 1:23 2:36 3:23 4:18
1 1:23 2:36 3:23 4:18
0 1:36 2:32 3:5 4:27
1 1:14 2:41 3:36 4:9
1 1:18 2:50 3:18 4:14
0 1:36 2:23 3:23 4:18
0 1:15 2:40 3:30 4:15
In terms of software, I am using libsvm-3.22 and python 2.7.5
Align your input file properly, is my first observation. The code for libsvm doesnt look for exactly 4 features. I identifies by the string literals you have provided separating the features from the labels. I suggest manually converting your input file to create the desired input argument.
Try the following code in python to run
Requirements - h5py, if your input is from matlab. (.mat file)
pip install h5py
import h5py
f = h5py.File('traininglabel.mat', 'r')# give label.mat file for training
variables = f.items()
labels = []
c = []
import numpy as np
for var in variables:
data = var[1]
lables = (data.value[0])
trainlabels= []
for i in lables:
trainlabels.append(str(i))
finaltrain = []
trainlabels = np.array(trainlabels)
for i in range(0,len(trainlabels)):
if trainlabels[i] == '0.0':
trainlabels[i] = '0'
if trainlabels[i] == '1.0':
trainlabels[i] = '1'
print trainlabels[i]
f = h5py.File('training_features.mat', 'r') #give features here
variables = f.items()
lables = []
file = open('traindata.txt', 'w+')
for var in variables:
data = var[1]
lables = data.value
for i in range(0,1000): #no of training samples in file features.mat
file.write(str(trainlabels[i]))
file.write(' ')
for j in range(0,49):
file.write(str(lables[j][i]))
file.write(' ')
file.write('\n')

R simplify heatmap to pdf

I want to plot a simplified heatmap that is not so difficult to edit with the scalar vector graphics program I am using (inkscape). The original heatmap as produced below contains lots of rectangles, and I wonder if they could be merged together in the different sectors to simplify the output pdf file:
nentries=100000
ci=rainbow(nentries)
set.seed=1
mean=10
## Generate some data (4 factors)
i = data.frame(
a=round(abs(rnorm(nentries,mean-2))),
b=round(abs(rnorm(nentries,mean-1))),
c=round(abs(rnorm(nentries,mean+1))),
d=round(abs(rnorm(nentries,mean+2)))
)
minvalue = 10
# Discretise values to 1 or 0
m0 = matrix(as.numeric(i>minvalue),nrow=nrow(i))
# Remove rows with all zeros
m = m0[rowSums(m0)>0,]
# Reorder with 1,1,1,1 on top
ms =m[order(as.vector(m %*% matrix(2^((ncol(m)-1):0),ncol=1)), decreasing=TRUE),]
rowci = rainbow(nrow(ms))
colci = rainbow(ncol(ms))
colnames(ms)=LETTERS[1:4]
limits=c(which(!duplicated(ms)),nrow(ms))
l=length(limits)
toname=round((limits[-l]+ limits[-1])/2)
freq=(limits[-1]-limits[-l])/nrow(ms)
rn=rep("", nrow(ms))
for(i in toname) rn[i]=paste(colnames(ms)[which(ms[i,]==1)],collapse="")
rn[toname]=paste(rn[toname], ": ", sprintf( "%.5f", freq ), "%")
heatmap(ms,
Rowv=NA,
labRow=rn,
keep.dendro = FALSE,
col=c("black","red"),
RowSideColors=rowci,
ColSideColors=colci,
)
dev.copy2pdf(file="/tmp/file.pdf")
Why don't you try RSvgDevice? Using it you could save your image as svg file, which is much convenient to Inkscape than pdf
I use the Cairo package for producing svg. It's incredibly easy. Here is a much simpler plot than the one you have in your example:
require(Cairo)
CairoSVG(file = "tmp.svg", width = 6, height = 6)
plot(1:10)
dev.off()
Upon opening in Inkscape, you can ungroup the elements and edit as you like.
Example (point moved, swirl added):
I don't think we (the internet) are being clear enough on this one.
Let me just start off with a successful export example
png("heatmap.png") #Ruby dev's think of this as kind of like opening a `File.open("asdfsd") do |f|` block
heatmap(sample_matrix, Rowv=NA, Colv=NA, col=terrain.colors(256), scale="column", margins=c(5,10))
dev.off()
The dev.off() bit, in my mind, reminds me of an end call to a ruby block or method, in that, the last line of the "nested" or enclosed (between png() and dev.off()) code's output is what gets dumped into the png file.
For example, if you ran this code:
png("heatmap4.png")
heatmap(sample_matrix, Rowv=NA, Colv=NA, col=terrain.colors(32), scale="column", margins=c(5,15))
heatmap(sample_matrix, Rowv=NA, Colv=NA, col=greenred(32), scale="column", margins=c(5,15))
dev.off()
it would output the 2nd (greenred color scheme, I just tested it) heatmap to the heatmap4.png file, just like how a ruby method returns its last line by default

Resources