How can I add colours in ggpairs (GGally)? - colors

I'm new in GGally and I'd like to ask u how to add colours in the different lines of my variables in ggpairs(). Can I give a specific colour for each one?
Thanks!

If you just want to add/specify color for groups, you can set these manually using scale_fill_manual or scale_color_manual depending on the type of plot in the upper/lower/diagonal regions of the plot:
library(GGally)
pm <- ggpairs(flea, columns = 2:4, ggplot2::aes(colour=species))
pm +
scale_color_manual(values = c("red", "blue", "purple")) +
scale_fill_manual(values = c("red", "blue", "purple"))

Related

How to create clustered bar graph based on one of the columns?

I am trying to create a bar graph using the data below, and I want to do binning based on the category provided in the first column, I am looking for a feature like hue in seaborn with gnuplot.
The csv file I am using looks as follows (snippet):
DS_TYPE, arg1, arg2, arg3, arg4
type1, 24, 20000, 15, 20
type2, 48, 20000, 20, 60
type3, 96, 20000, 25, 90
type3, 144, 200000, 30, 110
...
The fig I wanted is a bar chart using arg1 as x-axis, arg4 as y-axis, DS_TYPE (there are only 3 types) as hue.
Currently I only see solutions by adding more columns to this csv file, (arg1_type1, arg1_type2...and so on). I tried:
#!/bin/bash
gnuplot -persist <<-EOFMarker
set datafile separator ','
plot './test.csv' using 2:3:xtic(1) with boxes
EOFMarker
I read similar code like this from the gnuplot manual on (rowstacked) histogram, but I cannot find a solution for bar chart:
Each cluster of boxes is derived from a single row of the input data file. It is common in such input files that
the first element of each row is a label. Labels from this column may be placed along the x-axis underneath
the appropriate cluster of boxes with the xticlabels option to using.
I tried to use similar code for bar chart (I am not sure whether I understand it correctly, I think if I use xtic(1), it means I will use the first column for binning), using the code described above, but it didn't work.
The graph I am looking for (I did it with seaborn) is like this:
Note:
In this example there are only 3 types, but I am looking for a binning approach where it can handle cases when the number of type is N(unknown).
After clarification and illustrative example of OP, here is an attempt to get to the desired plot.
create unique lists of your keywords for the items and the group. The sequence will be in the order of first occurrence in the data. Unfortunately, gnuplot has no internal sorting feature. Alphanumerical sorting would require some external tools or very weird workarounds.
plot the data in two nested loops by filtering the data accordingly.
add the legend by using the ternary operator (check help ternary).
add a single xtic centered per group, independent if there are odd or even numbers of items.
This solution is not very obvious and maybe there is a simpler approach using the plotting style histogram. I would love to learn about a simpler solution.
Script:
### plot grouped box chart
reset session
$Data <<EOD
DS_TYPE, arg1, arg2, arg3, arg4
type1, 24, 20000, 15, 20
type2, 48, 20000, 20, 60
type3, 96, 20000, 25, 90
type3, 144, 200000, 30, 110
EOD
set datafile separator comma
myColors = "0x3a7ca4 0xe38a3f 0x439549"
myColor(i) = int(word(myColors,i))
colI = 1 # column item
colG = 3 # column group
colY = 5 # column y-value
uniq1 = uniq2 = ''
addToList(list,s) = list.( int(sum [_i=1:words(list)] word(list,_i) eq s) ? '' : ' '.s)
stats $Data u (uniq1=addToList(uniq1,strcol(colI)), uniq2=addToList(uniq2,strcol(colG))) skip 1 nooutput
gap = 1
xPos(i,j) = (i-1) + (j-1)*words(uniq1) + j*gap
set boxwidth 1.0
set style fill solid 1.0
set key noautotitle top left
set tics out
set yrange[0:]
set offsets 0.5,0.5,0.5,0
plot for [i=1:words(uniq1)] for [j=1:words(uniq2)] $Data u (xPos(i,j)): \
(word(uniq1,i) eq strcol(colI) && word(uniq2,j) eq strcol(colG) ? column(colY) : NaN): \
(myColor(i)) w boxes lc rgb var ti j==1?word(uniq1,i):'', \
for [j=1:words(uniq2)] '+' u ((xPos(1,j)+xPos(words(uniq1),j))/2.):(NaN):xtic(word(uniq2,j)) every ::::0
### end of script
Result:

gnuplot single plot in different colors

I have a single column of data (say 100 samples):
plot 'file' using 1 with lines
But this data is segmented: 10 points, then 10 more, etc... and I'd like each block of 10 to appear in a different color. I did filter them to 10 separate files and used
plot 'file.1' with lines, 'file.2' with lines...
But then the X axis goes 0..10 instead of 0..100 and all 10 graphs are stacked. Is there a simple way to do that without having to generate fake X data ?
Depending on your detailed data format... the following is doing what I think you are asking for.
Your "fake x data" is called pseudocolumn 0, check help pseudocolumns. The color you can change with lc var, check help linecolor variable.
Code:
### variable line color
reset session
# create some test data
set print $Data
do for [i=1:100] {
print sprintf("%g", rand(0)*i)
}
set print
plot $Data u 0:1:(int($0/10)) w lp pt 7 lc var notitle
### end of code
Result:

How to use df.plot to set different colors in one plot for one line?

I need to plot line plot that has different colors. I create special df column 'color' that contains for each point appropriate color.
I already found the solution here:
python/matplotlib - multicolor line
And take the approach from the above question. First, it was working when I use index but now I need to plot it vs other column and I can not appropriately handle the colors. It is all the time colores only with one color.
I use this code for setting colors, but it color line with one color that is the last in the column 'color'. And also create a legend that I don't understand how to delete from the plot.
for color2, start, end in gen_repeating(df2['color']):
print(start, end)
if start > 0: # make sure lines connect
start -= 1
idx = df2.index[start:end+1]
x2 = idx
y2 = df2.loc[idx, 'age_gps_data'].tolist()
df2.plot(x='river_km', y='age_gps_data', color=color2, ax=ax[1])
ax[1].xaxis.set_major_locator(plt.MaxNLocator(5))
plt.setp(ax[1].get_xticklabels())
I would appreciate any help.
How can I set these colors to achieve different color in one line? And don't have legend on the plot.

gnuplot setting line titles by variables

Iam trying to plot multiple data lines with their titles in the key based on the variable which I am using as the index:
plot for [i=0:10] 'filename' index i u 2:7 w lines lw 2 t ' = '/(0.5*i)
However, it cannot seem to do this for a fractional multiple of i. Is there a way around this other than to set the title for each line separately?
sprintf should provide all the functionality needed, e.g.,
plot for [i=0:10] .... t sprintf(" = %.1f", 0.5*i)
in order to use the value of 0.5*i with 1 decimal digit...

Change color and legend of plotLearnerPrediction ggplot2 object

I've been producing a number of nice plots with the plotLearnerPrediction function in the mlr package for R. They look like this. From looking into the source code of the plotLearnerPrediction function it looks like the color surfaces are made with geom_tile.
A plot can for example be made by:
library(mlr)
data(iris)
#make a learner
lrn <- "classif.qda"
#make a task
my.task <- makeClassifTask(data = iris, target = "Species")
#make plot
plotLearnerPrediction(learner = lrn, task = my.task)
Now I wish to change the colors, using another red, blue and green tone to match those of some other plots that I've made for a project. for this I tried scale_fill_continuous and scale_fill_manual without any luck (Error: Discrete value supplied to continuous scale) I also wish to change the legend title and the labels for each legend entry (Which I tried giving appropriate parameters to the above scale_fill's). There's a lot of info out there on how to set the geom_tile colours when producing the plot, but I haven't found any info on how to do this post-production (i.e. in somebody else's plot object). Any help would be much appreciated.
When you look into the source code you see how the plot is generated and then you can see which scale has to be overwritten or set.
In this example it's fairly easy:
g = plotLearnerPrediction(learner = lrn, task = my.task)
library(ggplot2)
g + scale_fill_manual(values = c(setosa = "yellow", versicolor = "blue", virginica = "red"))

Resources