Extracting selective text using beautiful soup and write the result in CSV - python-3.x
I am trying to extract selective text from website [https://data.gov.au/dataset?q=&groups=business&sort=extras_harvest_portal%20asc%2C%20score%20desc%2C%20metadata_modified%20desc&_organization_limit=0&organization=reservebankofaustralia&_groups_limit=0]
and have written code using beautiful soup:
`
wiki = "https://data.gov.au/dataset?q=&groups=business&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&organization=reservebankofaustralia&_groups_limit=0"
page= urllib.request.urlopen(wiki)
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(page)
data2 = soup.find_all('h3', class_="dataset-heading")
data3 = []
getdata = []
for link in data2:
data3 = soup.find_all("a", href=re.compile('/dataset/', re.IGNORECASE))
for data in data3:
getdata = data.text
print(getdata)
len(getdata)
`
My HTML is like :
<a href = "/dataset/banks-assets, class = "label" data-format = "xls">XLS<\a>
When I am running above code I am getting text that I want but 'XLS' word is coming alternatively, I want to remove 'XLS' and want to parse remaining text in csv in one column.My output is :
Banks – Assets
XLS
Consolidated Exposures – Immediate and Ultimate
Risk Basis
XLS
Foreign Exchange Transactions and Holdings of
Official Reserve Assets
XLS
Finance Companies and General Financiers
– Selected Assets and Liabilities
XLS
Liabilities and Assets –
Monthly XLS Consolidated Exposures – Immediate Risk Basis –
International Claims by Country
XLS
and so on.......
I checked whether above output is list. It was given list but it has only one element but as I have shown above my output is many texts.
Please help me out with it.
if the purpose is only to remove XLS rows from result column, then it can be reached, for example, ths way:
from urllib.request import urlopen
wiki = "https://data.gov.au/dataset?q=&groups=business&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&organization=reservebankofaustralia&_groups_limit=0"
page= urlopen(wiki)
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(page)
data2 = soup.find_all('h3', class_="dataset-heading")
data3 = []
getdata = []
for link in data2:
data3 = soup.find_all("a", href=re.compile('/dataset/', re.IGNORECASE))
for data in data3:
if data.text.upper() != 'XLS':
getdata.append(data.text)
print(getdata)
You will get a list with text you need. Then it can be easily transformed, for example, to DataFrame, where this data will appear as a column.
import pandas as pd
df = pd.DataFrame(columns=['col1'], data=getdata)
output:
col1
0 Banks – Assets
1 Consolidated Exposures – Immediate and Ultimat...
2 Foreign Exchange Transactions and Holdings of ...
3 Finance Companies and General Financiers – Sel...
4 Liabilities and Assets – Monthly
5 Consolidated Exposures – Immediate Risk Basis ...
6 Consolidated Exposures – Ultimate Risk Basis
7 Banks – Consolidated Group off-balance Sheet B...
8 Liabilities of Australian-located Operations
9 Building Societies – Selected Assets and Liabi...
10 Consolidated Exposures – Immediate Risk Basis ...
11 Banks – Consolidated Group Impaired Assets
12 Assets and Liabilities of Australian-Located O...
13 Managed Funds
14 Daily Net Foreign Exchange Transactions
15 Consolidated Exposures-Immediate Risk Basis
16 Public Unit Trust
17 Securitisation Vehicles
18 Assets of Australian-located Operations
19 Banks – Consolidated Group Capital
Putting to csv:
df.to_csv('C:\Users\Username\output.csv')
Related
How to convert a non-fixed width spaced delimited file to a pandas dataframe
ID 0x4607 Delivery_person_ID INDORES13DEL02 Delivery_person_Age 37.000000 Delivery_person_Ratings 4.900000 Restaurant_latitude 22.745049 Restaurant_longitude 75.892471 Delivery_location_latitude 22.765049 Delivery_location_longitude 75.912471 Order_Date 19-03-2022 Time_Orderd 11:30 Time_Order_picked 11:45 Weather conditions Sunny Road_traffic_density High Vehicle_condition 2 Type_of_order Snack Type_of_vehicle motorcycle multiple_deliveries 0.000000 Festival No City Urban Time_taken (min) 24.000000 Name: 0, dtype: object In an online exam, the machine learning training dataset has been split into multiple txt files. The file contains data as shown in the image. I am unable to understand how to read this data in python and convert it to a pandas dataframe. There are more than 45,000 txt files each containing data of a record of the dataset. I will have to merge those 45,000 txt files into a single .csv file. Any help will be highly appreciated.
Each of your txt files seems to contain only 1 row (as a Series). Unfortunately, these rows are not in an easy-to-read format (for the machines) - looks like they were just printed out and saved like that. Because of this in my solution the indices of the dataframe (which correspond to the Name - in last row of each file) won't be read: my final dataframe will be reindexed. You'll have to iterate through all your files. Just for my example, I'm using a list of the file names: file_names = ['file0.txt', 'file1.txt'] rows = [pd.read_csv(file_name, sep='\s\s+', header=None, index_col=0, skipfooter=1, engine='python').iloc[:, 0] for file_name in file_names] df = pd.DataFrame(rows).reset_index(drop=True)
You can simply use basic python to do it with something like: data = """ID 0x4607 Delivery_person_ID INDORES13DEL02 Delivery_person_Age 37.000000 Delivery_person_Ratings 4.900000 Restaurant_latitude 22.745049 Restaurant_longitude 75.892471 Delivery_location_latitude 22.765049 Delivery_location_longitude 75.912471 Order_Date 19-03-2022 Time_Orderd 11:30 Time_Order_picked 11:45 Weather conditions Sunny Road_traffic_density High Vehicle_condition 2 Type_of_order Snack Type_of_vehicle motorcycle multiple_deliveries 0.000000 Festival No City Urban Time_taken (min) 24.000000""" for line in data.split('\n'): content = line.split() name = ' '.join(content[:-1]) value = content[-1] print(name, value) And from the moment that you have the name and the value you can add them to a panda dataframe
how to extract different tables in excel sheet using python
In one excel file, sheet 1 , there are 4 tables at different locations in the sheet .How to read those 4 tables . for example I have even added one picture snap from google for reference. without using indexes is there any other way to extract tables.
I assume your tables are formatted as "Excel Tables". You can create an excel table by mark a range and then click: Then there are a good guide from Samuel Oranyeli how to import the Excel Tables with Python. I have used his code and show with examples. I have used the following data in excel, where each color represents a table. Remarks about code: The following part can be used to check which tables exist in the worksheet that we are working with: # check what tables that exist in the worksheet print({key : value for key, value in ws.tables.items()}) In our example this code will give: {'Table2': 'A1:C18', 'Table3': 'D1:F18', 'Table4': 'G1:I18', 'Table5': 'J1:K18'} Here you set the dataframe names. Be cautious if the number of dataframes missmatches the number of tables you will get an error. # Extract all the tables to individually dataframes from the dictionary Table2, Table3, Table4, Table5 = mapping.values() # Print each dataframe print(Table2.head(3)) # Print first 3 rows from df print(Table2.head(3)) gives: Index first_name last_name address 0 Aleshia Tomkiewicz 14 Taylor St 1 Evan Zigomalas 5 Binney St 2 France Andrade 8 Moor Place Full code: #import libraries from openpyxl import load_workbook import pandas as pd # read file wb = load_workbook("G:/Till/Tables.xlsx") # Set the filepath + filename # select the sheet where tables are located ws = wb["Tables"] # check what tables that exist in the worksheet print({key : value for key, value in ws.tables.items()}) mapping = {} # loop through all the tables and add to a dictionary for entry, data_boundary in ws.tables.items(): # parse the data within the ref boundary data = ws[data_boundary] ### extract the data ### # the inner list comprehension gets the values for each cell in the table content = [[cell.value for cell in ent] for ent in data] header = content[0] #the contents ... excluding the header rest = content[1:] #create dataframe with the column names #and pair table name with dataframe df = pd.DataFrame(rest, columns = header) mapping[entry] = df # print(mapping) # Extract all the tables to individually dataframes from the dictionary Table2, Table3, Table4, Table5 = mapping.values() # Print each dataframe print(Table2) print(Table3) print(Table4) print(Table5) Example data, example file: first_name last_name address city county postal Aleshia Tomkiewicz 14 Taylor St St. Stephens Ward Kent CT2 7PP Evan Zigomalas 5 Binney St Abbey Ward Buckinghamshire HP11 2AX France Andrade 8 Moor Place East Southbourne and Tuckton W Bournemouth BH6 3BE Ulysses Mcwalters 505 Exeter Rd Hawerby cum Beesby Lincolnshire DN36 5RP Tyisha Veness 5396 Forth Street Greets Green and Lyng Ward West Midlands B70 9DT Eric Rampy 9472 Lind St Desborough Northamptonshire NN14 2GH Marg Grasmick 7457 Cowl St #70 Bargate Ward Southampton SO14 3TY Laquita Hisaw 20 Gloucester Pl #96 Chirton Ward Tyne & Wear NE29 7AD Lura Manzella 929 Augustine St Staple Hill Ward South Gloucestershire BS16 4LL Yuette Klapec 45 Bradfield St #166 Parwich Derbyshire DE6 1QN Fernanda Writer 620 Northampton St Wilmington Kent DA2 7PP Charlesetta Erm 5 Hygeia St Loundsley Green Ward Derbyshire S40 4LY Corrinne Jaret 2150 Morley St Dee Ward Dumfries and Galloway DG8 7DE Niesha Bruch 24 Bolton St Broxburn, Uphall and Winchburg West Lothian EH52 5TL Rueben Gastellum 4 Forrest St Weston-Super-Mare North Somerset BS23 3HG Michell Throssell 89 Noon St Carbrooke Norfolk IP25 6JQ Edgar Kanne 99 Guthrie St New Milton Hampshire BH25 5DF
You may convert your excel sheet to csv file and then use csv module to grab rows. import pandas as pd read_file = pd.read_excel("Test.xlsx") read_file.to_csv ("Test.csv",index = None,header=True) enter code here df = pd.DataFrame(pd.read_csv("Test.csv")) print(df) For better approch please provide us sample excel file
You need two things: Access OpenXML data via python: https://github.com/python-openxml/python-xlsx Find the tables in the file, via what is called a DefinedName: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.definedname?view=openxml-2.8.1
You may convert your excel sheet to csv file and then use csv module to grab rows. //Code for excel to csv import pandas as pd read_file = pd.read_excel ("Test.xlsx") read_file.to_csv ("Test.csv",index = None,header=True) df = pd.DataFrame(pd.read_csv("Test.csv")) print(df) For better approch please provide us sample excel file
Unable to separate text data in csv. (Separate text with # so that it becomes two columns)
According to Gran, the company has no plans to move all production to Russia, although that is where the company is growing .#neutral The above is the text and I want to separate it with # so that it will produce two columns data = pd.read_csv(r'F:\Sentences_50Agree.csv', sep='#', header=None) I tried the above but it's not working. It's showing only 1 column with total text including #neutral
import pandas as pd from io import StringIO s = 'According to Gran, the company has no plans to move all production to Russia, although that is where the company is growing .#neutral' print( pd.read_csv(StringIO(s), sep='#', header=None) ) Prints: 0 1 0 According to Gran, the company has no plans to... neutral Or with file: print( pd.read_csv('file.txt', sep='#', header=None) )
Reading in CSVs and how to write the name of the CSV file into every row of the CSV
I have about 2,000 CSV's I was hoping to read into a df but first I was wondering how someone would (before joining all the CSVs) write the name of every CSV in the every row of the CSV. Like for example, in CSV1, there would be a column that would say "CSV1" in every row. And same for CSV2, 3 etc. Was wondering if there was a way to accomplish this? import os import glob import pandas as pd os.chdir(r"C:\Users\User\Downloads\Complete Corporate Financial History") extension = 'csv' all_filenames = [i for i in glob.glob('*.{}'.format(extension))] combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ]) The csv files all look like this: https://docs.google.com/spreadsheets/d/1hOb_nNjB3K8ldyyBUemQlcsTWcjyD8iLh8XMa5XB8Qk/edit?usp=sharing They don't have the Ticker (file name) in each row though. Edit: Here are the column headers: Quarter end Shares Shares split adjusted Split factor Assets Current Assets Liabilities Current Liabilities Shareholders equity Non-controlling interest Preferred equity Goodwill & intangibles Long-term debt Revenue Earnings Earnings available for common stockholders EPS basic EPS diluted Dividend per share Cash from operating activities Cash from investing activities Cash from financing activities Cash change during period Cash at end of period Capital expenditures Price Price high Price low ROE ROA Book value of equity per share P/B ratio P/E ratio Cumulative dividends per share Dividend payout ratio Long-term debt to equity ratio Equity to assets ratio Net margin Asset turnover Free cash flow per share Current ratio and the rows descend by quarter. Sample Data ,Quarter end,Shares,Shares split adjusted,Split factor,Assets,Current Assets,Liabilities,Current Liabilities,Shareholders equity,Non-controlling interest,Preferred equity,Goodwill & intangibles,Long-term debt,Revenue,Earnings,Earnings available for common stockholders,EPS basic,EPS diluted,Dividend per share,Cash from operating activities,Cash from investing activities,Cash from financing activities,Cash change during period,Cash at end of period,Capital expenditures,Price,Price high,Price low,ROE,ROA,Book value of equity per share,P/B ratio,P/E ratio,Cumulative dividends per share,Dividend payout ratio,Long-term debt to equity ratio,Equity to assets ratio,Net margin,Asset turnover,Free cash flow per share,Current ratio 0,6/30/2019,440000000.0,440000000.0,1.0,17900000000.0,6020000000.0,13000000000.0,3620000000.0,4850000000.0,12000000.0,55000000,5190000000.0,5900000000.0,3.69E+09,-1.20E+08,-1.20E+08,-0.27,-0.27,0.08,1.06E+08,1.29E+08,-2.00E+08,34000000,1360000000.0,128000000.0,22.55,25.83,19.27,0.0855,0.0243,10.9,1.98,16.11,33.46,0.2916,1.2296,0.2679,0.0311,0.78,-0.05,1.662 1,3/31/2019,449000000.0,449000000.0,1.0,18400000000.0,6050000000.0,13200000000.0,3660000000.0,5170000000.0,12000000.0,55000000,5420000000.0,5900000000.0,3.54E+09,1.87E+08,1.86E+08,0.4,0.39,0.08,-2.60E+08,42000000,-7.40E+08,-9.60E+08,1330000000.0,164000000.0,18.37,20.61,16.12,0.1298,0.0373,11.39,1.61,14.13,33.38,0.1798,1.1542,0.2784,0.0485,0.77,-0.94,1.6543 2,12/31/2018,485000000.0,485000000.0,1.0,18700000000.0,6580000000.0,13100000000.0,3520000000.0,5570000000.0,12000000.0,55000000,7250000000.0,5900000000.0,3.47E+09,2.18E+08,2.18E+08,0.45,0.45,0.06,4.26E+08,3.54E+08,-4.00E+07,7.40E+08,2280000000.0,-31000000.0,19.62,23.6,15.63,0.1208,0.035,11.38,1.79,None,33.3,0.1813,1.0685,0.2952,0.0457,0.76,0.94,1.8696 3,9/30/2018,483000000.0,483000000.0,1.0,18300000000.0,6130000000.0,13000000000.0,3010000000.0,5360000000.0,14000000.0,55000000,5470000000.0,6320000000.0,3.52E+09,1.61E+08,1.60E+08,0.33,0.32,0.06,51000000,65000000,-3.20E+07,82000000,1540000000.0,207000000.0,19.88,23.13,16.64,-0.0594,-0.0165,10.98,1.86,None,33.24,None,1.1902,0.2895,None,0.75,-0.32,2.0345 4,6/30/2018,483000000.0,483000000.0,1.0,18200000000.0,6080000000.0,13000000000.0,2980000000.0,5200000000.0,14000000.0,55000000,5480000000.0,6310000000.0,3.57E+09,1.20E+08,1.20E+08,0.25,0.24,0.06,1.76E+08,1.17E+08,-3.50E+07,2.52E+08,1460000000.0,166000000.0,20.27,24.07,16.47,-0.069,-0.0186,10.66,1.88,None,33.18,None,1.2259,0.2826,None,0.73,0.02,2.0406 5,3/31/2018,483000000.0,483000000.0,1.0,18200000000.0,5900000000.0,12900000000.0,2800000000.0,5270000000.0,14000000.0,55000000,5560000000.0,6310000000.0,3.45E+09,1.43E+08,1.42E+08,0.3,0.29,0.06,-4.40E+08,29000000,-5.40E+08,-9.50E+08,1210000000.0,117000000.0,26.87,31.17,22.57,-0.0536,-0.0134,10.8,2.67,None,33.12,None,1.2102,0.2861,None,0.7,-1.15,2.1039 6,12/31/2017,483000000.0,483000000.0,1.0,18700000000.0,6380000000.0,13800000000.0,2820000000.0,4910000000.0,14000000.0,55000000,7410000000.0,6810000000.0,3.27E+09,-7.30E+08,-7.30E+08,-1.51,-1.51,0.06,6.12E+08,-2.40E+08,-4.50E+07,3.35E+08,2150000000.0,236000000.0,25.3,27.85,22.74,-0.0232,-0.0038,10.06,2.07,None,33.06,None,1.4019,0.2594,None,0.67,0.78,2.2585 7,9/30/2017,481000000.0,481000000.0,1.0,19200000000.0,6150000000.0,13300000000.0,2680000000.0,5950000000.0,13000000.0,55000000,5250000000.0,6800000000.0,3.24E+09,1.19E+08,1.01E+08,0.23,0.22,0.06,1.72E+08,-1.30E+08,-1.50E+07,30000000,1820000000.0,131000000.0,24.76,26.84,22.67,-0.1222,-0.0308,12.24,1.92,None,33.0,None,1.1543,0.3063,None,0.65,0.09,2.2966 8,6/30/2017,441000000.0,441000000.0,1.0,19100000000.0,6030000000.0,13400000000.0,2660000000.0,5740000000.0,13000000.0,55000000,5220000000.0,6800000000.0,3.26E+09,2.12E+08,1.94E+08,0.44,0.43,0.06,2.17E+08,-1.30E+08,-8.60E+08,-7.70E+08,1790000000.0,125000000.0,25.2,28.65,21.75,-0.0899,-0.0231,12.89,2.05,None,32.94,None,1.1954,0.2976,None,0.61,0.21,2.2698 9,3/31/2017,441000000.0,441000000.0,1.0,20200000000.0,6710000000.0,14700000000.0,2590000000.0,5480000000.0,13000000.0,55000000,5170000000.0,8050000000.0,3.19E+09,3.22E+08,3.05E+08,0.69,0.65,0.06,-3.00E+08,1.03E+09,-4.30E+07,6.90E+08,2550000000.0,113000000.0,24.66,30.69,18.64,-0.0815,-0.0223,12.31,2.15,None,32.88,None,1.4826,0.2692,None,0.59,-0.94,2.5937 10,12/31/2016,441000000.0,441000000.0,1.0,20000000000.0,5890000000.0,14900000000.0,2750000000.0,5120000000.0,26000000.0,55000000,6940000000.0,8040000000.0,3.06E+09,-1.30E+09,-1.30E+09,-2.92,-2.92,7.76,6.62E+08,-2.40E+08,-4.00E+08,0,1860000000.0,302000000.0,24.43,32.1,16.75,-0.098,-0.029,11.49,0.91,None,32.82,None,1.5897,0.2525,None,0.57,0.82,2.1433 11,9/30/2016,438000000.0,438000000.0,1.0,37400000000.0,9370000000.0,23500000000.0,5500000000.0,11800000000.0,2170000000.0,55000000,5380000000.0,9500000000.0,5.21E+09,1.66E+08,1.48E+08,0.34,0.33,0.09,3.06E+08,-2.30E+08,-1.40E+08,-6.60E+07,1860000000.0,152000000.0,30,32.91,27.09,-0.0377,-0.0105,26.73,1.07,None,25.06,None,0.8107,0.313,None,0.57,0.35,1.7033 12,6/30/2016,1320000000.0,438000000.0,0.333333,36100000000.0,8090000000.0,21600000000.0,5490000000.0,12300000000.0,2190000000.0,55000000,5400000000.0,8280000000.0,5.30E+09,1.35E+08,1.18E+08,0.09,0.09,0.03,3.32E+08,3.11E+08,-1.00E+08,5.45E+08,1930000000.0,-50000000.0,30.42,34.5,26.34,-0.047,-0.0139,28.01,1.1,None,24.97,None,0.6741,0.3398,None,0.58,0.87,1.4747 13,3/31/2016,1320000000.0,438000000.0,0.333333,36100000000.0,7670000000.0,21800000000.0,5560000000.0,12200000000.0,2140000000.0,55000000,5400000000.0,8260000000.0,4.95E+09,16000000,-2000000,0,0,0.03,-4.30E+08,-1000000,-1.10E+08,-5.40E+08,1380000000.0,29000000.0,24.54,30.66,18.42,-0.0467,-0.0137,27.76,0.9,None,24.88,None,0.6784,0.3368,None,0.59,-1.05,1.3798 14,12/31/2015,1310000000.0,438000000.0,0.333333,36500000000.0,7950000000.0,22400000000.0,5210000000.0,12000000000.0,2090000000.0,55000000,7540000000.0,9040000000.0,5.25E+09,-7.00E+08,-7.20E+08,-0.55,-0.55,0.03,8.65E+08,-4.60E+08,-2.30E+08,1.80E+08,1920000000.0,398000000.0,28.48,33.54,23.43,-0.0324,-0.0089,27.36,0.99,25.66,24.79,None,0.7542,0.3283,None,0.62,1.07,1.5262
You could try something like this, then: df_list = [] for filename in all_filenames: df = pd.read_csv(filename) # Adds a column Ticker to the dataframe with the filename in the column. # The split function will work if no filename has more than one period. # Otherwise, you can use Python built-in function to trim off the extension. df['Ticker'] = filename.split('.')[0] df_list.append(df) all_dfs = pd.concat(df_list, axis=0)
Can't think of an inbuilt way of doing this, but an alternative way is, expand your for loop and load the data frame to a variable create a column, df['fileName']=filename.split('.')[0], to get just the file name without the .csv. Then append this df to a list , this list will get appended every loop and after the loop completion just do a df.concat(list_csv, axis=0) to make one single df. Replying from my phone so couldn't type in a working code, but it's easy if you think about it. KR, Alex
How to write content of a list into an Excel sheet using openpyxl
I have the following list: d_list = ["No., Start Name, Destination, Distance (miles)", "1,ALBANY,NY CRAFT,28", "2,GRACO,PIONEER,39", "3,FONDA,ROME,41", "4,NICCE,MARRINERS,132", "5,TOUCAN,SUBVERSIVE,100", "6,POLL,CONVERGENCE,28", "7,STONE HOUSE,HUDSON VALLEY,9", "8,GLOUCESTER GRAIN,BLACK MUDD POND,75", "9,ARMY LEAGUE,MUMURA,190", "10,MURRAY,FARMINGDALE,123"] So, basically, the list consists of thousands of elements (just showed here a sample of 10), each is a string of comma separated elements. I'd like to write this into a new worksheet in a workbook. Note: the workbook already exists and contains other sheets, I'm just adding a new sheet with this data. My code: import openpyxl wb = openpyxl.load_workbook('data.xlsx') sheet = wb.create_sheet(title='distance') for i in range(len(d_list)): sheet.append(list(d_list[i])) I'm expecting (in this example) 11 rows of data, each with 4 columns. However, I'm getting 11 rows alright but with each character of each string written in each cell! I think am almost there ... what am I missing? (Note: I've read through all the available posts related to this topic, but couldn't find any that answers this specific type of of question, hence I'm asking). Many thanks!
You can use pandas to solve this: 1.) Convert your list into a dataframe: In [231]: l Out[231]: ['No., Start Name, Destination, Distance (miles)', '1,ALBANY,NY CRAFT,28', '2,GRACO,PIONEER,39', '3,FONDA,ROME,41', '4,NICCE,MARRINERS,132', '5,TOUCAN,SUBVERSIVE,100', '6,POLL,CONVERGENCE,28', '7,STONE HOUSE,HUDSON VALLEY,9', '8,GLOUCESTER GRAIN,BLACK MUDD POND,75', '9,ARMY LEAGUE,MUMURA,190', '10,MURRAY,FARMINGDALE,123'] In [228]: df = pd.DataFrame([i.split(",") for i in l]) In [229]: df Out[229]: 0 1 2 3 0 No. Start Name Destination Distance (miles) 1 1 ALBANY NY CRAFT 28 2 2 GRACO PIONEER 39 3 3 FONDA ROME 41 4 4 NICCE MARRINERS 132 5 5 TOUCAN SUBVERSIVE 100 6 6 POLL CONVERGENCE 28 7 7 STONE HOUSE HUDSON VALLEY 9 8 8 GLOUCESTER GRAIN BLACK MUDD POND 75 9 9 ARMY LEAGUE MUMURA 190 10 10 MURRAY FARMINGDALE 123 2.) Write the above Dataframe to excel in a new-sheet in 4 columns: import numpy as np from openpyxl import load_workbook path = "data.xlsx" book = load_workbook(path) writer = pd.ExcelWriter(path, engine = 'openpyxl') writer.book = book df.to_excel(writer, sheet_name = 'distance') writer.save() writer.close()