Multiplot grouping and labeling by a specific column [duplicate] - gnuplot

I currently have:
set terminal png
set datafile separator ","
set style data linespoints
set key
plot 'data/forplotting/population.csv' using 3:4 title column(1)
With the data file (not all of it... I don't want to flood this question):
Country Name,Country Code,Year,Value
Arab World,ARB,1960,96388069
Euro area,EMU,1960,260300607
Euro area,EMU,1961,262639170
Euro area,EMU,1962,265056064
Euro area,EMU,1963,267532538
Euro area,EMU,1964,269969434
Euro area,EMU,1965,272389008
Euro area,EMU,1966,274649191
Euro area,EMU,1967,276601113
Euro area,EMU,1968,278434336
Euro area,EMU,1969,280295897
Euro area,EMU,1970,281804083
Euro area,EMU,1971,283295830
and etc for all other countries.
I would like the output to be similar to
but I took a look at the source of the data used in this graph and it is not structured like mine.
My current output:
Any help would be great! Thanks.

Of course you can plot such data with gnuplot-only.
You need to create a list of unique key entries and then filter your data accordingly (there are a few similar questions here).
The unique list of the countries is in the order of first occurrence.
Assumption is that the years are in chronological order. Unfortunately, gnuplot has no easy direct sorting capability (i.e. you have to go via external tools).
Data: SO28874305.dat
Country Name,Country Code,Year,Value
Austria, AT, 1960, 1
Austria, AT, 1970, 4
Austria, AT, 1980, 3
Austria, AT, 1990, 4
Austria, AT, 1995, 5
Belgium, BE, 1960, 3
Belgium, BE, 1965, 7
Belgium, BE, 1975, 5
Belgium, BE, 1980, 3
Denmark, DK, 1963, 6
Denmark, DK, 1967, 4
Denmark, DK, 1974, 2
Denmark, DK, 1988, 5
France, FR, 1977, 2
France, FR, 1989, 4
France, FR, 1992, 3
France, FR, 1997, 5
Germany, DE, 1972, 5
Germany, DE, 1980, 6
Germany, DE, 1983, 8
Germany, DE, 1995, 3
Germany, DE, 1999, 6
Script: (tested with gnuplot 4.6.0, March 2012 and gnuplot 5.4.0, June 2020)
### use a column as legend
reset
FILE = "SO28874305.dat"
set datafile separator ","
# create a unique list of strings from a column
addToList(list,col) = list.( strstrt(list,' '.strcol(col)) > 0 ? '' : ' '.strcol(col))
Uniques=''
stats FILE u (Uniques=addToList(Uniques,1)) every ::1 nooutput
set datafile missing "NaN"
myFilter(colD,colF,valF) = (strcol(colF) eq valF) ? column(colD) : NaN
plot for [Country in Uniques] FILE u 3:(myFilter(4,1,Country)) w lp pt 7 ti Country
### end of script
Result: (created with gnuplot 4.6.0)

Related

vba concatenate three (dynamic) tables into a master one

I am wanting to concatenate separate tables from separate excel worksheets to create one master one.The issue with the tables is that they are dynamic ie, one table could have 100 rows, the other one could have 240, third table could have 50 for example. The tables themselves were generated by quite a few different processes, essentially they had individual filters applied and then were copied and pasted into these separate excel worksheets… ready to be used to concatenated!
I’ve managed to do all of the processes in vba so could prefer to stick with it. I don’t want to use power query (Because of connection issues and also wanted this to be automated). I also don’t want to get involved with pivot tables or do this in the sql database. This is for quite a few different reasons… and so would prefer to stick to VBA.
Ie
Table 1
Column a column b column c
Africa 100 4
Australia 0.1 5
America 200 7
Table 2
Column a column b column c
China 300 4
Australia 0.1 4
America 100 4
Table3
Column a column b column c
Bali 100 4
England 0.1 5
NZ 200 8
Result
Column a column b column c
Africa 100 4
Australia 0.1 5
America 200 7
China 300 4
Australia 0.1 4
America 100 4
Bali 100 4
England 0.1 5
NZ 200 8
If anyone has any recommendations, would love to hear

Extracting specific values from a pandas columns and storing it in new columns

I have a pandas column which is storing data in a form of a list in the following format:
text
[['Mark','PERSON'],['Data Scientist','TITLE'], ['Berlin','LOC'], ['Python','SKILLS'], ['Tableau,','SKILLS'], ['SQL','SKILLS'], ['AWS','SKILLS']]
[['John','PERSON'],['Data Engineer','TITLE'], ['London','LOC'], ['Python','SKILLS'], ['DB2,','SKILLS'], ['SQL','SKILLS']
[['Pearson','PERSON'],['Intern','TITLE'], ['Barcelona','LOC'], ['Python','SKILLS'], ['Excel,','SKILLS'], ['SQL','SKILLS']
[['Broody','PERSON'],['Manager','TITLE'], ['Barcelona','LOC'], ['Team Management','SKILLS'], ['Excel,','SKILLS'], ['Good Communications','SKILLS']
[['Rita','PERSON'],['Software Developer','TITLE'], ['London','LOC'], ['Dot Net','SKILLS'], ['SQl Server,','SKILLS'], ['VS Code,'SKILLS']
What I want to see as an output is :
PERSON TITLE LOC SKILLS
Mark Data Scientist Berlin Python, Tableau, SQL, AWS
John Data Engineer London Python, DB2,SQL
..... and so on for the rest of the input rows as well
So essentially splitting the data by "," and storing the left part before "," as the column header and the right part of the "," as the value.
How can I achieve this?
If you have a data frame like this call it "df":
index text
0 1 [[Mark, PERSON], [Data Scientist, TITLE], [Ber...
1 2 [[John, PERSON], [Data Engineer, TITLE], [Lond...
2 3 [[Pearson, PERSON], [Intern, TITLE], [Barcelon...
3 4 [[Broody, PERSON], [Manager, TITLE], [Barcelon...
4 5 [[Rita, PERSON], [Software Developer, TITLE], ...
You can try something like that :
person=[]
skills=[]
title=[]
loc=[]
temp=[]
for i in range(len(df['text'])):
for j in range(len(df['text'][i])):
if df['text'][i][j][1]=='PERSON':
person.append(df['text'][i][j][0])
elif df['text'][i][j][1]=='TITLE':
title.append(df['text'][i][j][0])
elif df['text'][i][j][1]=='LOC':
loc.append(df['text'][i][j][0])
elif df['text'][i][j][1]=='SKILLS':
temp.append(df['text'][i][j][0].replace(",", ""))
skills.append(",".join(temp))
temp=[]
Output
PERSON TITLE LOC SKILLS
0 Mark Data Scientist Berlin Python,Tableau,SQL,AWS
1 John Data Engineer London Python,DB2,SQL
2 Pearson Intern Barcelona Python,Excel,SQL
3 Broody Manager Barcelona Team Management,Excel,Good Communications
4 Rita Software Developer London Dot Net,SQl Server,VS Code

How do I transpose a Dataframe and how to scatter plot the transposed df

I have this dataframe with 20 countries and 20 years of data
Country 2000 2001 2002 ...
USA 1 2 3
CANADA 4 5 6
SWEDEN 7 8 9
...
and I want to get a new df to create a scatter plot with y = value for each column (country) and x= Year
Country USA CANADA SWEDEN ...
2000 1 4 7
2001 2 5 8
2002 3 6 9
...
My Code :
data = pd.read_csv("data.csv")
data.set_index("Country Name", inplace = True)
data_transposed = data.T
I'm struggling to create this kind of scatter plot.
Any idea ?
Thanks
Scatter is a plot which receives x and y only, you can scatter the whole dataframe directly. However, a small workaround:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data={"Country":["USA", "Canada", "Brasil"], 2000:[1,4,7], 2001:[3,7,9], 2002: [2,8,5]})
for column in df.columns:
if column != "Country":
plt.scatter(x=df["Country"], y=df[column])
plt.show()
result:
It just plotting each column separately, eventually you get what you want.
As you see, each year is represent by different colors - you can do the opposite (plotting years and having countries as different colors). Scatter is 1x1: you have Country, Year, Value. You can present only two of them in a scatter plot (unless you use colors for example)
You need to transpose your dataframe for that (as you specify yourself what x and y are) but you can do it with df.transpose(): see documentation.
Notice in my df, country column is not an index. You can use set_index or reset_index to control it.

converting comma separated data of excel file into new line using python, pandas

I have an excel file with data as
StudentId Details
1234 John, Texas, United States
9887 Roma, Moscow, Russia
I want to convert it into the following format, such that:
StudentId Details
1234 John
Texas
United States
9887 Roma
Moscow
Russia
I am using Pandas for this purpose but not getting the results
for i in range(len(df['Details'])):
df['Details'][i]=df['Details'][i].replace(',','\n')
I am using somewhat this kind of logic
Read it in with read_fwf() then split up that column later:
df['Details'].str.split(',')

How can I combine like x values on a scatter plot in excel?

So I've got a column of profits for each sale and a column of which province the sale was in. I want to make a scatter plot but group the x values by province instead having 8400 different x values. I have tried for days to get this to work, even grouping the provinces into single bigger cells using VBA didnt work.
Edit:
So for example Ill have
ontario 12
vermont 3
ontario 6
ohio 8
and I want it to graph 3 columns, with the ontario column having a point at 12 and 6.
I want it to do that but I have 8400 data points so I cant do it by hand...

Resources