Extracting selective text using beautiful soup and write the result in CSV - python-3.x

I am trying to extract selective text from website [https://data.gov.au/dataset?q=&groups=business&sort=extras_harvest_portal%20asc%2C%20score%20desc%2C%20metadata_modified%20desc&_organization_limit=0&organization=reservebankofaustralia&_groups_limit=0]
and have written code using beautiful soup:
`
wiki = "https://data.gov.au/dataset?q=&groups=business&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&organization=reservebankofaustralia&_groups_limit=0"
page= urllib.request.urlopen(wiki)
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(page)
data2 = soup.find_all('h3', class_="dataset-heading")
data3 = []
getdata = []
for link in data2:
data3 = soup.find_all("a", href=re.compile('/dataset/', re.IGNORECASE))
for data in data3:
getdata = data.text
print(getdata)
len(getdata)
`
My HTML is like :
<a href = "/dataset/banks-assets, class = "label" data-format = "xls">XLS<\a>
When I am running above code I am getting text that I want but 'XLS' word is coming alternatively, I want to remove 'XLS' and want to parse remaining text in csv in one column.My output is :
Banks – Assets
XLS
Consolidated Exposures – Immediate and Ultimate
Risk Basis
XLS
Foreign Exchange Transactions and Holdings of
Official Reserve Assets
XLS
Finance Companies and General Financiers
– Selected Assets and Liabilities
XLS
Liabilities and Assets –
Monthly XLS Consolidated Exposures – Immediate Risk Basis –
International Claims by Country
XLS
and so on.......
I checked whether above output is list. It was given list but it has only one element but as I have shown above my output is many texts.
Please help me out with it.

if the purpose is only to remove XLS rows from result column, then it can be reached, for example, ths way:
from urllib.request import urlopen
wiki = "https://data.gov.au/dataset?q=&groups=business&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&organization=reservebankofaustralia&_groups_limit=0"
page= urlopen(wiki)
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(page)
data2 = soup.find_all('h3', class_="dataset-heading")
data3 = []
getdata = []
for link in data2:
data3 = soup.find_all("a", href=re.compile('/dataset/', re.IGNORECASE))
for data in data3:
if data.text.upper() != 'XLS':
getdata.append(data.text)
print(getdata)
You will get a list with text you need. Then it can be easily transformed, for example, to DataFrame, where this data will appear as a column.
import pandas as pd
df = pd.DataFrame(columns=['col1'], data=getdata)
output:
col1
0 Banks – Assets
1 Consolidated Exposures – Immediate and Ultimat...
2 Foreign Exchange Transactions and Holdings of ...
3 Finance Companies and General Financiers – Sel...
4 Liabilities and Assets – Monthly
5 Consolidated Exposures – Immediate Risk Basis ...
6 Consolidated Exposures – Ultimate Risk Basis
7 Banks – Consolidated Group off-balance Sheet B...
8 Liabilities of Australian-located Operations
9 Building Societies – Selected Assets and Liabi...
10 Consolidated Exposures – Immediate Risk Basis ...
11 Banks – Consolidated Group Impaired Assets
12 Assets and Liabilities of Australian-Located O...
13 Managed Funds
14 Daily Net Foreign Exchange Transactions
15 Consolidated Exposures-Immediate Risk Basis
16 Public Unit Trust
17 Securitisation Vehicles
18 Assets of Australian-located Operations
19 Banks – Consolidated Group Capital
Putting to csv:
df.to_csv('C:\Users\Username\output.csv')

Related

How to convert a non-fixed width spaced delimited file to a pandas dataframe

ID 0x4607
Delivery_person_ID INDORES13DEL02
Delivery_person_Age 37.000000
Delivery_person_Ratings 4.900000
Restaurant_latitude 22.745049
Restaurant_longitude 75.892471
Delivery_location_latitude 22.765049
Delivery_location_longitude 75.912471
Order_Date 19-03-2022
Time_Orderd 11:30
Time_Order_picked 11:45
Weather conditions Sunny
Road_traffic_density High
Vehicle_condition 2
Type_of_order Snack
Type_of_vehicle motorcycle
multiple_deliveries 0.000000
Festival No
City Urban
Time_taken (min) 24.000000
Name: 0, dtype: object
In an online exam, the machine learning training dataset has been split into multiple txt files. The file contains data as shown in the image. I am unable to understand how to read this data in python and convert it to a pandas dataframe. There are more than 45,000 txt files each containing data of a record of the dataset. I will have to merge those 45,000 txt files into a single .csv file. Any help will be highly appreciated.
Each of your txt files seems to contain only 1 row (as a Series).
Unfortunately, these rows are not in an easy-to-read format (for the machines) - looks like they were just printed out and saved like that.
Because of this in my solution the indices of the dataframe (which correspond to the Name - in last row of each file) won't be read: my final dataframe will be reindexed.
You'll have to iterate through all your files. Just for my example, I'm using a list of the file names:
file_names = ['file0.txt', 'file1.txt']
rows = [pd.read_csv(file_name, sep='\s\s+', header=None, index_col=0, skipfooter=1, engine='python').iloc[:, 0]
for file_name in file_names]
df = pd.DataFrame(rows).reset_index(drop=True)
You can simply use basic python to do it with something like:
data = """ID 0x4607
Delivery_person_ID INDORES13DEL02
Delivery_person_Age 37.000000
Delivery_person_Ratings 4.900000
Restaurant_latitude 22.745049
Restaurant_longitude 75.892471
Delivery_location_latitude 22.765049
Delivery_location_longitude 75.912471
Order_Date 19-03-2022
Time_Orderd 11:30
Time_Order_picked 11:45
Weather conditions Sunny
Road_traffic_density High
Vehicle_condition 2
Type_of_order Snack
Type_of_vehicle motorcycle
multiple_deliveries 0.000000
Festival No
City Urban
Time_taken (min) 24.000000"""
for line in data.split('\n'):
content = line.split()
name = ' '.join(content[:-1])
value = content[-1]
print(name, value)
And from the moment that you have the name and the value you can add them to a panda dataframe

how to extract different tables in excel sheet using python

In one excel file, sheet 1 , there are 4 tables at different locations in the sheet .How to read those 4 tables . for example I have even added one picture snap from google for reference. without using indexes is there any other way to extract tables.
I assume your tables are formatted as "Excel Tables".
You can create an excel table by mark a range and then click:
Then there are a good guide from Samuel Oranyeli how to import the Excel Tables with Python. I have used his code and show with examples.
I have used the following data in excel, where each color represents a table.
Remarks about code:
The following part can be used to check which tables exist in the worksheet that we are working with:
# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})
In our example this code will give:
{'Table2': 'A1:C18', 'Table3': 'D1:F18', 'Table4': 'G1:I18', 'Table5': 'J1:K18'}
Here you set the dataframe names. Be cautious if the number of dataframes missmatches the number of tables you will get an error.
# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()
# Print each dataframe
print(Table2.head(3)) # Print first 3 rows from df
print(Table2.head(3)) gives:
Index first_name last_name address
0 Aleshia Tomkiewicz 14 Taylor St
1 Evan Zigomalas 5 Binney St
2 France Andrade 8 Moor Place
Full code:
#import libraries
from openpyxl import load_workbook
import pandas as pd
# read file
wb = load_workbook("G:/Till/Tables.xlsx") # Set the filepath + filename
# select the sheet where tables are located
ws = wb["Tables"]
# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})
mapping = {}
# loop through all the tables and add to a dictionary
for entry, data_boundary in ws.tables.items():
# parse the data within the ref boundary
data = ws[data_boundary]
### extract the data ###
# the inner list comprehension gets the values for each cell in the table
content = [[cell.value for cell in ent]
for ent in data]
header = content[0]
#the contents ... excluding the header
rest = content[1:]
#create dataframe with the column names
#and pair table name with dataframe
df = pd.DataFrame(rest, columns = header)
mapping[entry] = df
# print(mapping)
# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()
# Print each dataframe
print(Table2)
print(Table3)
print(Table4)
print(Table5)
Example data, example file:
first_name
last_name
address
city
county
postal
Aleshia
Tomkiewicz
14 Taylor St
St. Stephens Ward
Kent
CT2 7PP
Evan
Zigomalas
5 Binney St
Abbey Ward
Buckinghamshire
HP11 2AX
France
Andrade
8 Moor Place
East Southbourne and Tuckton W
Bournemouth
BH6 3BE
Ulysses
Mcwalters
505 Exeter Rd
Hawerby cum Beesby
Lincolnshire
DN36 5RP
Tyisha
Veness
5396 Forth Street
Greets Green and Lyng Ward
West Midlands
B70 9DT
Eric
Rampy
9472 Lind St
Desborough
Northamptonshire
NN14 2GH
Marg
Grasmick
7457 Cowl St #70
Bargate Ward
Southampton
SO14 3TY
Laquita
Hisaw
20 Gloucester Pl #96
Chirton Ward
Tyne & Wear
NE29 7AD
Lura
Manzella
929 Augustine St
Staple Hill Ward
South Gloucestershire
BS16 4LL
Yuette
Klapec
45 Bradfield St #166
Parwich
Derbyshire
DE6 1QN
Fernanda
Writer
620 Northampton St
Wilmington
Kent
DA2 7PP
Charlesetta
Erm
5 Hygeia St
Loundsley Green Ward
Derbyshire
S40 4LY
Corrinne
Jaret
2150 Morley St
Dee Ward
Dumfries and Galloway
DG8 7DE
Niesha
Bruch
24 Bolton St
Broxburn, Uphall and Winchburg
West Lothian
EH52 5TL
Rueben
Gastellum
4 Forrest St
Weston-Super-Mare
North Somerset
BS23 3HG
Michell
Throssell
89 Noon St
Carbrooke
Norfolk
IP25 6JQ
Edgar
Kanne
99 Guthrie St
New Milton
Hampshire
BH25 5DF
You may convert your excel sheet to csv file and then use csv module to grab rows.
import pandas as pd
read_file = pd.read_excel("Test.xlsx")
read_file.to_csv ("Test.csv",index = None,header=True)
enter code here
df = pd.DataFrame(pd.read_csv("Test.csv"))
print(df)
For better approch please provide us sample excel file
You need two things:
Access OpenXML data via python: https://github.com/python-openxml/python-xlsx
Find the tables in the file, via what is called a DefinedName: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.definedname?view=openxml-2.8.1
You may convert your excel sheet to csv file and then use csv module to grab rows.
//Code for excel to csv
import pandas as pd
read_file = pd.read_excel ("Test.xlsx")
read_file.to_csv ("Test.csv",index = None,header=True)
df = pd.DataFrame(pd.read_csv("Test.csv"))
print(df)
For better approch please provide us sample excel file

Unable to separate text data in csv. (Separate text with # so that it becomes two columns)

According to Gran, the company has no plans to move all production to Russia, although that is where the company is growing .#neutral
The above is the text and I want to separate it with # so that it will produce two columns
data = pd.read_csv(r'F:\Sentences_50Agree.csv', sep='#', header=None)
I tried the above but it's not working. It's showing only 1 column with total text including #neutral
import pandas as pd
from io import StringIO
s = 'According to Gran, the company has no plans to move all production to Russia, although that is where the company is growing .#neutral'
print( pd.read_csv(StringIO(s), sep='#', header=None) )
Prints:
0 1
0 According to Gran, the company has no plans to... neutral
Or with file:
print( pd.read_csv('file.txt', sep='#', header=None) )

Reading in CSVs and how to write the name of the CSV file into every row of the CSV

I have about 2,000 CSV's I was hoping to read into a df but first I was wondering how someone would (before joining all the CSVs) write the name of every CSV in the every row of the CSV. Like for example, in CSV1, there would be a column that would say "CSV1" in every row. And same for CSV2, 3 etc.
Was wondering if there was a way to accomplish this?
import os
import glob
import pandas as pd
os.chdir(r"C:\Users\User\Downloads\Complete Corporate Financial History")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
The csv files all look like this:
https://docs.google.com/spreadsheets/d/1hOb_nNjB3K8ldyyBUemQlcsTWcjyD8iLh8XMa5XB8Qk/edit?usp=sharing
They don't have the Ticker (file name) in each row though.
Edit: Here are the column headers: Quarter end Shares Shares split adjusted Split factor Assets Current Assets Liabilities Current Liabilities Shareholders equity Non-controlling interest Preferred equity Goodwill & intangibles Long-term debt Revenue Earnings Earnings available for common stockholders EPS basic EPS diluted Dividend per share Cash from operating activities Cash from investing activities Cash from financing activities Cash change during period Cash at end of period Capital expenditures Price Price high Price low ROE ROA Book value of equity per share P/B ratio P/E ratio Cumulative dividends per share Dividend payout ratio Long-term debt to equity ratio Equity to assets ratio Net margin Asset turnover Free cash flow per share Current ratio
and the rows descend by quarter.
Sample Data
,Quarter end,Shares,Shares split adjusted,Split factor,Assets,Current Assets,Liabilities,Current Liabilities,Shareholders equity,Non-controlling interest,Preferred equity,Goodwill & intangibles,Long-term debt,Revenue,Earnings,Earnings available for common stockholders,EPS basic,EPS diluted,Dividend per share,Cash from operating activities,Cash from investing activities,Cash from financing activities,Cash change during period,Cash at end of period,Capital expenditures,Price,Price high,Price low,ROE,ROA,Book value of equity per share,P/B ratio,P/E ratio,Cumulative dividends per share,Dividend payout ratio,Long-term debt to equity ratio,Equity to assets ratio,Net margin,Asset turnover,Free cash flow per share,Current ratio
0,6/30/2019,440000000.0,440000000.0,1.0,17900000000.0,6020000000.0,13000000000.0,3620000000.0,4850000000.0,12000000.0,55000000,5190000000.0,5900000000.0,3.69E+09,-1.20E+08,-1.20E+08,-0.27,-0.27,0.08,1.06E+08,1.29E+08,-2.00E+08,34000000,1360000000.0,128000000.0,22.55,25.83,19.27,0.0855,0.0243,10.9,1.98,16.11,33.46,0.2916,1.2296,0.2679,0.0311,0.78,-0.05,1.662
1,3/31/2019,449000000.0,449000000.0,1.0,18400000000.0,6050000000.0,13200000000.0,3660000000.0,5170000000.0,12000000.0,55000000,5420000000.0,5900000000.0,3.54E+09,1.87E+08,1.86E+08,0.4,0.39,0.08,-2.60E+08,42000000,-7.40E+08,-9.60E+08,1330000000.0,164000000.0,18.37,20.61,16.12,0.1298,0.0373,11.39,1.61,14.13,33.38,0.1798,1.1542,0.2784,0.0485,0.77,-0.94,1.6543
2,12/31/2018,485000000.0,485000000.0,1.0,18700000000.0,6580000000.0,13100000000.0,3520000000.0,5570000000.0,12000000.0,55000000,7250000000.0,5900000000.0,3.47E+09,2.18E+08,2.18E+08,0.45,0.45,0.06,4.26E+08,3.54E+08,-4.00E+07,7.40E+08,2280000000.0,-31000000.0,19.62,23.6,15.63,0.1208,0.035,11.38,1.79,None,33.3,0.1813,1.0685,0.2952,0.0457,0.76,0.94,1.8696
3,9/30/2018,483000000.0,483000000.0,1.0,18300000000.0,6130000000.0,13000000000.0,3010000000.0,5360000000.0,14000000.0,55000000,5470000000.0,6320000000.0,3.52E+09,1.61E+08,1.60E+08,0.33,0.32,0.06,51000000,65000000,-3.20E+07,82000000,1540000000.0,207000000.0,19.88,23.13,16.64,-0.0594,-0.0165,10.98,1.86,None,33.24,None,1.1902,0.2895,None,0.75,-0.32,2.0345
4,6/30/2018,483000000.0,483000000.0,1.0,18200000000.0,6080000000.0,13000000000.0,2980000000.0,5200000000.0,14000000.0,55000000,5480000000.0,6310000000.0,3.57E+09,1.20E+08,1.20E+08,0.25,0.24,0.06,1.76E+08,1.17E+08,-3.50E+07,2.52E+08,1460000000.0,166000000.0,20.27,24.07,16.47,-0.069,-0.0186,10.66,1.88,None,33.18,None,1.2259,0.2826,None,0.73,0.02,2.0406
5,3/31/2018,483000000.0,483000000.0,1.0,18200000000.0,5900000000.0,12900000000.0,2800000000.0,5270000000.0,14000000.0,55000000,5560000000.0,6310000000.0,3.45E+09,1.43E+08,1.42E+08,0.3,0.29,0.06,-4.40E+08,29000000,-5.40E+08,-9.50E+08,1210000000.0,117000000.0,26.87,31.17,22.57,-0.0536,-0.0134,10.8,2.67,None,33.12,None,1.2102,0.2861,None,0.7,-1.15,2.1039
6,12/31/2017,483000000.0,483000000.0,1.0,18700000000.0,6380000000.0,13800000000.0,2820000000.0,4910000000.0,14000000.0,55000000,7410000000.0,6810000000.0,3.27E+09,-7.30E+08,-7.30E+08,-1.51,-1.51,0.06,6.12E+08,-2.40E+08,-4.50E+07,3.35E+08,2150000000.0,236000000.0,25.3,27.85,22.74,-0.0232,-0.0038,10.06,2.07,None,33.06,None,1.4019,0.2594,None,0.67,0.78,2.2585
7,9/30/2017,481000000.0,481000000.0,1.0,19200000000.0,6150000000.0,13300000000.0,2680000000.0,5950000000.0,13000000.0,55000000,5250000000.0,6800000000.0,3.24E+09,1.19E+08,1.01E+08,0.23,0.22,0.06,1.72E+08,-1.30E+08,-1.50E+07,30000000,1820000000.0,131000000.0,24.76,26.84,22.67,-0.1222,-0.0308,12.24,1.92,None,33.0,None,1.1543,0.3063,None,0.65,0.09,2.2966
8,6/30/2017,441000000.0,441000000.0,1.0,19100000000.0,6030000000.0,13400000000.0,2660000000.0,5740000000.0,13000000.0,55000000,5220000000.0,6800000000.0,3.26E+09,2.12E+08,1.94E+08,0.44,0.43,0.06,2.17E+08,-1.30E+08,-8.60E+08,-7.70E+08,1790000000.0,125000000.0,25.2,28.65,21.75,-0.0899,-0.0231,12.89,2.05,None,32.94,None,1.1954,0.2976,None,0.61,0.21,2.2698
9,3/31/2017,441000000.0,441000000.0,1.0,20200000000.0,6710000000.0,14700000000.0,2590000000.0,5480000000.0,13000000.0,55000000,5170000000.0,8050000000.0,3.19E+09,3.22E+08,3.05E+08,0.69,0.65,0.06,-3.00E+08,1.03E+09,-4.30E+07,6.90E+08,2550000000.0,113000000.0,24.66,30.69,18.64,-0.0815,-0.0223,12.31,2.15,None,32.88,None,1.4826,0.2692,None,0.59,-0.94,2.5937
10,12/31/2016,441000000.0,441000000.0,1.0,20000000000.0,5890000000.0,14900000000.0,2750000000.0,5120000000.0,26000000.0,55000000,6940000000.0,8040000000.0,3.06E+09,-1.30E+09,-1.30E+09,-2.92,-2.92,7.76,6.62E+08,-2.40E+08,-4.00E+08,0,1860000000.0,302000000.0,24.43,32.1,16.75,-0.098,-0.029,11.49,0.91,None,32.82,None,1.5897,0.2525,None,0.57,0.82,2.1433
11,9/30/2016,438000000.0,438000000.0,1.0,37400000000.0,9370000000.0,23500000000.0,5500000000.0,11800000000.0,2170000000.0,55000000,5380000000.0,9500000000.0,5.21E+09,1.66E+08,1.48E+08,0.34,0.33,0.09,3.06E+08,-2.30E+08,-1.40E+08,-6.60E+07,1860000000.0,152000000.0,30,32.91,27.09,-0.0377,-0.0105,26.73,1.07,None,25.06,None,0.8107,0.313,None,0.57,0.35,1.7033
12,6/30/2016,1320000000.0,438000000.0,0.333333,36100000000.0,8090000000.0,21600000000.0,5490000000.0,12300000000.0,2190000000.0,55000000,5400000000.0,8280000000.0,5.30E+09,1.35E+08,1.18E+08,0.09,0.09,0.03,3.32E+08,3.11E+08,-1.00E+08,5.45E+08,1930000000.0,-50000000.0,30.42,34.5,26.34,-0.047,-0.0139,28.01,1.1,None,24.97,None,0.6741,0.3398,None,0.58,0.87,1.4747
13,3/31/2016,1320000000.0,438000000.0,0.333333,36100000000.0,7670000000.0,21800000000.0,5560000000.0,12200000000.0,2140000000.0,55000000,5400000000.0,8260000000.0,4.95E+09,16000000,-2000000,0,0,0.03,-4.30E+08,-1000000,-1.10E+08,-5.40E+08,1380000000.0,29000000.0,24.54,30.66,18.42,-0.0467,-0.0137,27.76,0.9,None,24.88,None,0.6784,0.3368,None,0.59,-1.05,1.3798
14,12/31/2015,1310000000.0,438000000.0,0.333333,36500000000.0,7950000000.0,22400000000.0,5210000000.0,12000000000.0,2090000000.0,55000000,7540000000.0,9040000000.0,5.25E+09,-7.00E+08,-7.20E+08,-0.55,-0.55,0.03,8.65E+08,-4.60E+08,-2.30E+08,1.80E+08,1920000000.0,398000000.0,28.48,33.54,23.43,-0.0324,-0.0089,27.36,0.99,25.66,24.79,None,0.7542,0.3283,None,0.62,1.07,1.5262
You could try something like this, then:
df_list = []
for filename in all_filenames:
df = pd.read_csv(filename)
# Adds a column Ticker to the dataframe with the filename in the column.
# The split function will work if no filename has more than one period.
# Otherwise, you can use Python built-in function to trim off the extension.
df['Ticker'] = filename.split('.')[0]
df_list.append(df)
all_dfs = pd.concat(df_list, axis=0)
Can't think of an inbuilt way of doing this, but an alternative way is,
expand your for loop and load the data frame to a variable
create a column, df['fileName']=filename.split('.')[0], to get just the file name without the .csv.
Then append this df to a list , this list will get appended every loop and after the loop completion just do a df.concat(list_csv, axis=0) to make one single df.
Replying from my phone so couldn't type in a working code, but it's easy if you think about it.
KR,
Alex

How to write content of a list into an Excel sheet using openpyxl

I have the following list:
d_list = ["No., Start Name, Destination, Distance (miles)",
"1,ALBANY,NY CRAFT,28",
"2,GRACO,PIONEER,39",
"3,FONDA,ROME,41",
"4,NICCE,MARRINERS,132",
"5,TOUCAN,SUBVERSIVE,100",
"6,POLL,CONVERGENCE,28",
"7,STONE HOUSE,HUDSON VALLEY,9",
"8,GLOUCESTER GRAIN,BLACK MUDD POND,75",
"9,ARMY LEAGUE,MUMURA,190",
"10,MURRAY,FARMINGDALE,123"]
So, basically, the list consists of thousands of elements (just showed here a sample of 10), each is a string of comma separated elements. I'd like to write this into a new worksheet in a workbook.
Note: the workbook already exists and contains other sheets, I'm just adding a new sheet with this data.
My code:
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.create_sheet(title='distance')
for i in range(len(d_list)):
sheet.append(list(d_list[i]))
I'm expecting (in this example) 11 rows of data, each with 4 columns. However, I'm getting 11 rows alright but with each character of each string written in each cell! I think am almost there ... what am I missing? (Note: I've read through all the available posts related to this topic, but couldn't find any that answers this specific type of of question, hence I'm asking).
Many thanks!
You can use pandas to solve this:
1.) Convert your list into a dataframe:
In [231]: l
Out[231]:
['No., Start Name, Destination, Distance (miles)',
'1,ALBANY,NY CRAFT,28',
'2,GRACO,PIONEER,39',
'3,FONDA,ROME,41',
'4,NICCE,MARRINERS,132',
'5,TOUCAN,SUBVERSIVE,100',
'6,POLL,CONVERGENCE,28',
'7,STONE HOUSE,HUDSON VALLEY,9',
'8,GLOUCESTER GRAIN,BLACK MUDD POND,75',
'9,ARMY LEAGUE,MUMURA,190',
'10,MURRAY,FARMINGDALE,123']
In [228]: df = pd.DataFrame([i.split(",") for i in l])
In [229]: df
Out[229]:
0 1 2 3
0 No. Start Name Destination Distance (miles)
1 1 ALBANY NY CRAFT 28
2 2 GRACO PIONEER 39
3 3 FONDA ROME 41
4 4 NICCE MARRINERS 132
5 5 TOUCAN SUBVERSIVE 100
6 6 POLL CONVERGENCE 28
7 7 STONE HOUSE HUDSON VALLEY 9
8 8 GLOUCESTER GRAIN BLACK MUDD POND 75
9 9 ARMY LEAGUE MUMURA 190
10 10 MURRAY FARMINGDALE 123
2.) Write the above Dataframe to excel in a new-sheet in 4 columns:
import numpy as np
from openpyxl import load_workbook
path = "data.xlsx"
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
df.to_excel(writer, sheet_name = 'distance')
writer.save()
writer.close()

Resources