How to tokenize/parse data in an excel sheet using spacy - excel

I'm trying to convert an excel sheet into a doc object using spacy, I spent the last couple of days trying to go around it but it seems a bit challenging. I have opened the sheet in both openpyxl and pandas, I can read the excel sheet and output the content but I couldn't integrate spacy to create doc/token objects.
Is it possible to process excel sheets in spacy's pipeline?
Thank you!

Spacy has no support for excel.
You could use pandas to read either the csv(if csv format)
or excel file
like
import pandas as pd
df = pd.read_csv(file)
or
df = pd.read_excel(file)
respectively.
Select required text column and iterate over df 'column' values and pass them over to nlp() of spacy

Related

Convert pandas Data frame to existing Excel keeping the worksheet format

I have a data frame that I want to convert into an existing excel file using openpyxl. This file is already created, and it has a format (shown in the image) that I want to keep once the information is transferred from the data frame.
import pandas as pd
import openpyxl
dataframe=pd.read_excel('info.xlsx')
with pd.ExcelWriter('file.xlsx', engine='openpyxl', if_sheet_exists='replace',mode='a', keep_format=True ) as writer:
dataframe.to_excel(writer,sheet_name='DATAFRAME INFO',startrow=1,index=None)
I can't find the way to do it, I have tried adding in "kwargs" something like keep_format=True, but still does not work, it always removes the existing format.
Thank you very much IMAGE OF THE FORMAT

How to write the data to excel with python and keep excel number format?

I'm trying to write the time data into excel with python (I'm using Pandas). When I write time data to excel I have excel number format 'General':
Sample Screenshot
But I need to have the number format as 'Time' - which I have when I paste the data as values manually to the excel file.
Is it possible to do the same with python? If yes how can I do it?
I have tried to change the values into DateTime object but when I write the data it always deletes cells format in excel file
df['starttime'] = pd.to_datetime(df['starttime']).dt.strftime('%I:%M:%S %p')
Have you tried to use Pandas?
Writing Excel Files Using Pandas
We'll be storing the information we'd like to write to an Excel file
in a DataFrame. Using the built-in to_excel() function, we can extract
this information into an Excel file.*
Step 1: install pandas in your py env
pip install pandas
Step 2: let's import the Pandas module:
import pandas as pd
Step 3:
use the to_excel() function to write the contents to a file. The only argument is the file path:
df.to_excel('./states.xlsx')

How to filter and write data from a large excel file?

I get a large excel file(100 MB +) which has data of various markets. I have specific filters of my market like 'Country Name' : 'UK', 'Initiation Date' : after "2010 Jan". I wanted to make a python program to make this filtering and writing data to a new excel file process automated but openpyxl takes too much time in loading an excel this big. I also tried a combination of openpyxl and xlsxwriter where i read the file read_only mode by iterating over rows in openpyxl and writing it in a new file with xlsxwriter but this takes too much time as well. Is there any simpler way to achieve this ?
Not sure wheather pandas can handle very large files but did you try Pandas?
mydf = pandas.read_excel(large_file.xlsx)
on reading time you can leave out columns you don't need
then filter your dataframe as discussed here
Select rows from dataframe
then write dataframe back to excel
mydf.to_excel('foo.xlsx', sheet_name='Sheet1')

Filling Webform using selenium from excel via python

I want to fill webforms using selenium web driver in python language. thats not a tough task indeed but I am unable to find out how can fill the webform when the data must be taken from excel file.
I have tried selenium to fill webform that is possible and easy
rom selenium import webdriver
driver = webdriver.Chrome("C:\\chrome_driver\\chromedriver_win32\\chromedriver.exe")
driver.get("https://admin.typeform.com/signup")
driver.find_element_by_id("signup_owner_alias").send_keys("Bruce Wayne")
driver.find_element_by_id("signup_owner_email").send_keys("bruce.wayne#gmail.com")
driver.find_element_by_id("signup_terms").click()
driver.find_element_by_id("signup_owner_language").click()
You can use the pandas library to fetch the data from your excel sheet. If you don't have it installed, you install it with pip: pip install pandas.
Below is an example of how data is fetched from an excel sheet using pandas.
import pandas as pd
df = pd.read_excel('centuries.xls')
sheet_years = df['Year']
for year in sheet_years:
print(year)
Basically, we fetched the excel sheet (centuries.xls) using the read_excel() method. Then we saved one of the columns ('Years' column in this example), in a variable (sheet_years). You can do same with other columns.
Rows in the saved column are automatically saved as list items, so we can iterate over these items using our for loop. You can replace it with your own code, instead of just printing the items in the list.
If your excel file contains more than one sheet, you can use the sheet_name parameter of the read_excel() method.
After doing the job with pandas, you can then send the output to your selenium code to fill the forms.
More information here:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
https://www.dataquest.io/blog/excel-and-pandas/

create nice tables in excel with Python

So i have this script that outputs a pandas dataframe which i can save to a notebook. These tables however arent professional looking and i was wondering if there was a way in pandas/excel writing modules that would allow me to add column headers to my columns , a legend, merge cells, add a title, etc.
This is what i get from python as a pandas dataframe:
with this script:
excel_df=pd.DataFrame(closeended_all_counts).T
excel_df.columns=all_columns
writer=pd.ExcelWriter('L:\OMIZ\March_2018.xlsx',engine='xlsxwriter')
excel_df.to_excel(writer,'Final Tables')
workbook = writer.book
worksheet = writer.sheets['Final Tables']
writer.save()
whereas i need this output:
any documentation or modules would be amazing!
Since you're looking for documentation, here you go:
https://xlsxwriter.readthedocs.io/example_tables.html?highlight=tables

Resources