I am wanting to concatenate separate tables from separate excel worksheets to create one master one.The issue with the tables is that they are dynamic ie, one table could have 100 rows, the other one could have 240, third table could have 50 for example. The tables themselves were generated by quite a few different processes, essentially they had individual filters applied and then were copied and pasted into these separate excel worksheets… ready to be used to concatenated!
I’ve managed to do all of the processes in vba so could prefer to stick with it. I don’t want to use power query (Because of connection issues and also wanted this to be automated). I also don’t want to get involved with pivot tables or do this in the sql database. This is for quite a few different reasons… and so would prefer to stick to VBA.
Ie
Table 1
Column a column b column c
Africa 100 4
Australia 0.1 5
America 200 7
Table 2
Column a column b column c
China 300 4
Australia 0.1 4
America 100 4
Table3
Column a column b column c
Bali 100 4
England 0.1 5
NZ 200 8
Result
Column a column b column c
Africa 100 4
Australia 0.1 5
America 200 7
China 300 4
Australia 0.1 4
America 100 4
Bali 100 4
England 0.1 5
NZ 200 8
If anyone has any recommendations, would love to hear
I have a pandas column which is storing data in a form of a list in the following format:
text
[['Mark','PERSON'],['Data Scientist','TITLE'], ['Berlin','LOC'], ['Python','SKILLS'], ['Tableau,','SKILLS'], ['SQL','SKILLS'], ['AWS','SKILLS']]
[['John','PERSON'],['Data Engineer','TITLE'], ['London','LOC'], ['Python','SKILLS'], ['DB2,','SKILLS'], ['SQL','SKILLS']
[['Pearson','PERSON'],['Intern','TITLE'], ['Barcelona','LOC'], ['Python','SKILLS'], ['Excel,','SKILLS'], ['SQL','SKILLS']
[['Broody','PERSON'],['Manager','TITLE'], ['Barcelona','LOC'], ['Team Management','SKILLS'], ['Excel,','SKILLS'], ['Good Communications','SKILLS']
[['Rita','PERSON'],['Software Developer','TITLE'], ['London','LOC'], ['Dot Net','SKILLS'], ['SQl Server,','SKILLS'], ['VS Code,'SKILLS']
What I want to see as an output is :
PERSON TITLE LOC SKILLS
Mark Data Scientist Berlin Python, Tableau, SQL, AWS
John Data Engineer London Python, DB2,SQL
..... and so on for the rest of the input rows as well
So essentially splitting the data by "," and storing the left part before "," as the column header and the right part of the "," as the value.
How can I achieve this?
If you have a data frame like this call it "df":
index text
0 1 [[Mark, PERSON], [Data Scientist, TITLE], [Ber...
1 2 [[John, PERSON], [Data Engineer, TITLE], [Lond...
2 3 [[Pearson, PERSON], [Intern, TITLE], [Barcelon...
3 4 [[Broody, PERSON], [Manager, TITLE], [Barcelon...
4 5 [[Rita, PERSON], [Software Developer, TITLE], ...
You can try something like that :
person=[]
skills=[]
title=[]
loc=[]
temp=[]
for i in range(len(df['text'])):
for j in range(len(df['text'][i])):
if df['text'][i][j][1]=='PERSON':
person.append(df['text'][i][j][0])
elif df['text'][i][j][1]=='TITLE':
title.append(df['text'][i][j][0])
elif df['text'][i][j][1]=='LOC':
loc.append(df['text'][i][j][0])
elif df['text'][i][j][1]=='SKILLS':
temp.append(df['text'][i][j][0].replace(",", ""))
skills.append(",".join(temp))
temp=[]
Output
PERSON TITLE LOC SKILLS
0 Mark Data Scientist Berlin Python,Tableau,SQL,AWS
1 John Data Engineer London Python,DB2,SQL
2 Pearson Intern Barcelona Python,Excel,SQL
3 Broody Manager Barcelona Team Management,Excel,Good Communications
4 Rita Software Developer London Dot Net,SQl Server,VS Code
I have this dataframe with 20 countries and 20 years of data
Country 2000 2001 2002 ...
USA 1 2 3
CANADA 4 5 6
SWEDEN 7 8 9
...
and I want to get a new df to create a scatter plot with y = value for each column (country) and x= Year
Country USA CANADA SWEDEN ...
2000 1 4 7
2001 2 5 8
2002 3 6 9
...
My Code :
data = pd.read_csv("data.csv")
data.set_index("Country Name", inplace = True)
data_transposed = data.T
I'm struggling to create this kind of scatter plot.
Any idea ?
Thanks
Scatter is a plot which receives x and y only, you can scatter the whole dataframe directly. However, a small workaround:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data={"Country":["USA", "Canada", "Brasil"], 2000:[1,4,7], 2001:[3,7,9], 2002: [2,8,5]})
for column in df.columns:
if column != "Country":
plt.scatter(x=df["Country"], y=df[column])
plt.show()
result:
It just plotting each column separately, eventually you get what you want.
As you see, each year is represent by different colors - you can do the opposite (plotting years and having countries as different colors). Scatter is 1x1: you have Country, Year, Value. You can present only two of them in a scatter plot (unless you use colors for example)
You need to transpose your dataframe for that (as you specify yourself what x and y are) but you can do it with df.transpose(): see documentation.
Notice in my df, country column is not an index. You can use set_index or reset_index to control it.
I have an excel file with data as
StudentId Details
1234 John, Texas, United States
9887 Roma, Moscow, Russia
I want to convert it into the following format, such that:
StudentId Details
1234 John
Texas
United States
9887 Roma
Moscow
Russia
I am using Pandas for this purpose but not getting the results
for i in range(len(df['Details'])):
df['Details'][i]=df['Details'][i].replace(',','\n')
I am using somewhat this kind of logic
Read it in with read_fwf() then split up that column later:
df['Details'].str.split(',')
So I've got a column of profits for each sale and a column of which province the sale was in. I want to make a scatter plot but group the x values by province instead having 8400 different x values. I have tried for days to get this to work, even grouping the provinces into single bigger cells using VBA didnt work.
Edit:
So for example Ill have
ontario 12
vermont 3
ontario 6
ohio 8
and I want it to graph 3 columns, with the ontario column having a point at 12 and 6.
I want it to do that but I have 8400 data points so I cant do it by hand...