Reducing the size of pdf figure file in matplotlib - dpi

In matplotlib, I am using LineCollection to draw and color the countries, where the boundaries of the counties are given. When I am saving the figure as a pdf file:
fig.savefig('filename.pdf',dpi=300)
the figure size are quite big. However, on saving them as png file:
fig.savefig('filename.png',dpi=300)
and then converting them to pdf using linux convert command the files are small. I tried reducing the dpi, however that do not change the pdf file size. Is there a way the figures can be saved directly as smaller-pdf files from matplotlib?

The PDF is larger, since it contains all the vector information. By saving a PNG, you produce a rasterized image. It seems that in your case, you can produce a smaller PDF by rasterizing the plot directly:
plt.plot(x, y, 'r-', rasterized=True)
Here, x, y are some plot coordinates. You basically have to use the additionally keyword argument raterized to achieve the effect.

I think using "rasterized = True" effectively saves the image similarly to png format. When you zoom in, you will see blurring pixels.
If you want the figures to be high quality, my suggestion is to sample from the data and make a plot. The pdf file size is roughly the amount of data points it need to remember.

Related

Get a single TIFF using multiple TIFFs and masks

I'm working with satellite imagery (from Sentinel-2), in particular with cloud detection and cloud cleaning.
I got a batch of images of the same area, but in different periods:
From these images, you can see that the position of the clouds is always different.
I also have the mask for each image, where the black areas represent clouds:
These masks are not perfect, but this is not a problem.
What I want to do is to use the mask to cut all the white portions (so get the land and exclude the clouds), and then fill these cuts with a black portion of another image (fill the "hole" in the image with a part of another image without clouds).
Imagery is in TIFF format, while masks are in JPG format.
I'm using Python with libraries like Rasterio, numpy and scikit-image, so a Pythonic solution would be appreciated.

Retain zoom by rectangle selection when saving the matplotlib figure

I have a program that plots a time series data.
The plot I have after program executes is crisp and can be zoomed in onto any resolution without any problems.
However, as soon as I save the matplotlib figure as a pdf or svg file, I lose the capability to zoom by rectangular selection.
Is there any way to save the matplotlib fig so that I retain the original resolution to the fullest.
Attached are figures to help you guys understand the problem.
Notice how in the original matplotlib figure object as I keep zooming in the time stamps become visible
but when I save it as a pdf or scalable vector graphics file it does not show the timestamps clearly no matter how much I zoom.
I have named the figures such so as to easily understand what each depicts.
Original matplotlib plot which is zoomed:
Original matplotlib figure object:
The zoomed in fig when saved as a pdf:

Matplotlib: consistent image size for publications

I want to make publication-quality plots with Matplotlib. The biggest problem I am having right now is to tune the image and font sizes.
When I create a figure with several panels, I usually set a bigger figsize. For example, these three panels are created with a figsize=(12, 6 / 1.618) (pasted from Jupyter Lab, I always save to PDF files).
The lines can be perfectly seen, there is a lot of space, the figure seems nice. The problem is that in my publication this has to be a column-wise figure, so it has to be scaled down. A colum has a width of around ~3.5 inches. When the image is resized, it still looks good, but the axes labels become very tiny and unreadable. Of course, I can just simply start increasing the font sizes until I find a good size, but I would like to have a workflow that allows me to work with the lengths and sizes I have to use.
When I set the image size to figsize=(columnw, 0.5*columnw / 1.618) (so the aspect ratio is the same) as before, and set the font size around 10 (the font size of my publication) this is what I get:
So now the fonts are exactly the size I want them to be, the figure does not have to be reescaled, but the contents of the graph seem to be compressed into a very very tiny space. It just look... ugly.
Then, my question is: why using a big figsize with extremely large fontsizes gives a beautiful, readable figure when scaled, but with the a priori correct figsize without rescaling seems to be ugly? How could I work with real figsizes from the very beginning to obtain something nice?
I read some questions regarding image size with Matplotlib on this site, as well as a pair of blog posts, but I haven't found any information regarding this problem.
Thank you in advance.

How to improve the magnification of picture (*.png) when exported in gnuplot?

I'd like to improve the magnification of images (*.png) when they are exported in gnuplot. I had tried to increase the pixels of these images but when they are zoomed too many times, the quality is so bad. So could you please help me for this case.
Here are my commands for exporting the images *.png in gnuplot:
set term pngcairo transparent enhanced lw 2.2 \
font "Century,20" fontscale 1.2 size 1642,1140"
The problem you are facing is not related to gnuplot but to the bitmap nature of png images. Since these images are not vectorial, when you "zoom in" you simply increase the bit size, but not the resolution. The only way to solve this problem is to export to eps instead of png. There are a few terminals in gnuplot that you might be interested in. In my opinion the most powerful is the epslatex terminal: have a look at the documentation with help epslatex.
As mentioned by Miguel, likely the source of your problem is that by exporting a PNG you are exporting an array of pixels. When you zoom in you will start to see the individual pixels of your image.
Probably the best way to solve your problem is to export to some form of vector graphics. Take a look at EPS (side note: most journals will prefer if you submit a vector graphic rather then a PNG).
If you are certain you want to use PNG you should take a look at https://stackoverflow.com/a/9118990/2372604 which mentions changing your terminal to pngcairo to produce smoother results.
Another note to make, if your function is particularly noisy, you may need to increase the number of sample points, consider the command set samples 1000.
Besides the other answers, here are two other options:
increase terminal size (say 4000x3000), until you got something that looks good enough. PNG format is compressed so if most of the plot is white, it won"t add much bytes.
As already said, use a vector graphics format as terminal. The others suggest EPS, but is less common today than svg. The SVG terminal produces .svg files that can be easily post-processed with a tool such as Inkscape.

Identify low-color jpeg images

From a large collection of jpeg images, I want to identify those more likely to be simple logos or text (as opposed to camera pictures). One identifying characteristic would be low color count. I expect most to have been created with a drawing program.
If a jpeg image has a palette, it's simple to get a color count. But I expect most files to be 24-bit color images. There's no restriction on image size.
I suppose I could create an array of 2^24 (16M) integers, iterate through every pixel and inc the count for that 24-bit color. Yuck. Then I would count the non-zero entries. But if the jpeg compression messes with the original colors I could end up counting a lot of unique pixels, which might be hard to distinguish from a photo. (Maybe I could convert each pixel to YUV colorspace, and save fewer counts.)
Any better ideas? Library suggestions? Humorous condescensions?
Sample 10000 random coordinates and make a histogram, then analyze the histogram.

Resources