So i have this script that outputs a pandas dataframe which i can save to a notebook. These tables however arent professional looking and i was wondering if there was a way in pandas/excel writing modules that would allow me to add column headers to my columns , a legend, merge cells, add a title, etc.
This is what i get from python as a pandas dataframe:
with this script:
excel_df=pd.DataFrame(closeended_all_counts).T
excel_df.columns=all_columns
writer=pd.ExcelWriter('L:\OMIZ\March_2018.xlsx',engine='xlsxwriter')
excel_df.to_excel(writer,'Final Tables')
workbook = writer.book
worksheet = writer.sheets['Final Tables']
writer.save()
whereas i need this output:
any documentation or modules would be amazing!
Since you're looking for documentation, here you go:
https://xlsxwriter.readthedocs.io/example_tables.html?highlight=tables
Related
I'm trying to convert an excel sheet into a doc object using spacy, I spent the last couple of days trying to go around it but it seems a bit challenging. I have opened the sheet in both openpyxl and pandas, I can read the excel sheet and output the content but I couldn't integrate spacy to create doc/token objects.
Is it possible to process excel sheets in spacy's pipeline?
Thank you!
Spacy has no support for excel.
You could use pandas to read either the csv(if csv format)
or excel file
like
import pandas as pd
df = pd.read_csv(file)
or
df = pd.read_excel(file)
respectively.
Select required text column and iterate over df 'column' values and pass them over to nlp() of spacy
I get a large excel file(100 MB +) which has data of various markets. I have specific filters of my market like 'Country Name' : 'UK', 'Initiation Date' : after "2010 Jan". I wanted to make a python program to make this filtering and writing data to a new excel file process automated but openpyxl takes too much time in loading an excel this big. I also tried a combination of openpyxl and xlsxwriter where i read the file read_only mode by iterating over rows in openpyxl and writing it in a new file with xlsxwriter but this takes too much time as well. Is there any simpler way to achieve this ?
Not sure wheather pandas can handle very large files but did you try Pandas?
mydf = pandas.read_excel(large_file.xlsx)
on reading time you can leave out columns you don't need
then filter your dataframe as discussed here
Select rows from dataframe
then write dataframe back to excel
mydf.to_excel('foo.xlsx', sheet_name='Sheet1')
In brief: how can I export a Google Sheets spreadsheet to an SQLite database without losing the cell-anchored images?
In long:
Google Sheets, Excel, and SQLite all allow cell-anchored images. Furthermore, Sheets supports exporting to Excel without loss of such images; and companion programs such as "DB Browser for SQLite",
and LibreOffice also support cell-anchored images. However, I have not been able to export a Sheet (or an Excel spreadsheet)
to SQLite, though I have tried all the obvious possibilities, and some less obvious ones as well. In the latter
category, two attempts are noteworthy:
a) The Python package openpyxl explicitly says
"All other workbook / worksheet attributes are not copied - e.g. Images, Charts."
b) Python's pandas is more promising, because of the dtype parameter of read_excel. Supposedly,
specifying this as object should allow preservation of objects such as cell-anchored images.
Here then is one of my (failed) attempts to use pandas to achieve the desired result:
import sqlite3
import pandas as pd
filename="Test"
con=sqlite3.connect(filename+".db")
wb = pd.read_excel('Test.xlsx',sheet_name = None, header = None, dtype = object)
for sheet in wb:
print(sheet) # Sheet1
# print( wb[sheet].columns )
wb[sheet].to_sql(sheet, con, index=False)
con.commit()
con.close()
Any solution, whether Python-based or not, would be gladly accepted.
Clarification
I'm aware of several techniques for extracting all the images into separate files but am looking for a fully automated technique (presumably some kind of script) for performing the conversion. Whether or not such a technique extracts the images as an intermediate step is immaterial.
I've also tried adding dtype specifications in the call to to_sql, but to no avail.
Addendum
#Stef's original program requires that the images to be copied are all in named columns, and that these names are either known or can be determined. The first assumption is acceptable, and the second can be relaxed by simply writing:
dtype = object
in the call to read_excel.
There's no direct way, but you can use openpyxl version 2.5.5 or later to read images and manually put them into the dataframe.
In the following minimal example I use pandas read_excel to first get all the data, except the images. The crucial point is to import the image column as object type in order to be able to assign the images later. Otherwise this empty column will get all NaNs and a float data type.
Then we read the images from Excel using openpyxl and import them into the dataframe. The ref attribute of the image holds an _io.BytesIO stream. Its pointer points to the end (EOF) after loading the workbook, so we'll have to rewind it first (img.ref.seek(0)). (btw, there seems to be a bug in the img.path names in openpyxl: I get the same path /xl/media/image1.png for all three images whereas it is image{1,2,3}.png in the xlsx).
Anchor row/column values are zero based (img.anchor.idx_base == 0), so we have to account for the header row when computing the iat position in the dataframe (and possible index columns if any). Finally we export the dataframe to SQL using to_sql.
import pandas as pd
import openpyxl
import sqlite3
file_name = 'so58068593.xlsx'
sheet_name = 'Tabelle1'
# read data into dataframe
df = pd.read_excel(file_name, sheet_name=sheet_name, dtype=object)
# read images and add them to dataframe
wb = openpyxl.load_workbook(file_name)
ws = wb[sheet_name]
for img in ws._images:
img.ref.seek(0)
df.iat[img.anchor.to.row-1, img.anchor.to.col] = img.ref.read()
# export to sqlite
with sqlite3.connect(file_name + ".db") as con:
df.to_sql(sheet_name, con=con)
Excel file (images taken from Wikipedia):
SQLite database viewed in DB Browser for SQLite:
This is just a minimal example. If you don't know in advance where the images are in your xlsx file, you could first iterate over the images collection of the worksheet and check which columns/rows you need for the images in your dataframe, then append them to the dataframe (if not already there) and only then assign the images. Please note, however, that in xlsx you can have data in a cell and at the same time an image achored to this cell, which of course can't be mapped to a database table or pandas dataframe. The reason is that images are not the content of a cell but just anchored to this cell (you could even have several images anchored to the same cell).
I want to fill webforms using selenium web driver in python language. thats not a tough task indeed but I am unable to find out how can fill the webform when the data must be taken from excel file.
I have tried selenium to fill webform that is possible and easy
rom selenium import webdriver
driver = webdriver.Chrome("C:\\chrome_driver\\chromedriver_win32\\chromedriver.exe")
driver.get("https://admin.typeform.com/signup")
driver.find_element_by_id("signup_owner_alias").send_keys("Bruce Wayne")
driver.find_element_by_id("signup_owner_email").send_keys("bruce.wayne#gmail.com")
driver.find_element_by_id("signup_terms").click()
driver.find_element_by_id("signup_owner_language").click()
You can use the pandas library to fetch the data from your excel sheet. If you don't have it installed, you install it with pip: pip install pandas.
Below is an example of how data is fetched from an excel sheet using pandas.
import pandas as pd
df = pd.read_excel('centuries.xls')
sheet_years = df['Year']
for year in sheet_years:
print(year)
Basically, we fetched the excel sheet (centuries.xls) using the read_excel() method. Then we saved one of the columns ('Years' column in this example), in a variable (sheet_years). You can do same with other columns.
Rows in the saved column are automatically saved as list items, so we can iterate over these items using our for loop. You can replace it with your own code, instead of just printing the items in the list.
If your excel file contains more than one sheet, you can use the sheet_name parameter of the read_excel() method.
After doing the job with pandas, you can then send the output to your selenium code to fill the forms.
More information here:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
https://www.dataquest.io/blog/excel-and-pandas/
I am coming from java background and have minimal idea regarding python. I have to read an excel file and validate one of it's column values in the DB to verify that those rows exist in the DB or not.
I know the exact libraries and steps in java using which I can do this work.
But I am facing problems in choosing the ways to do this work in python.
till now I am able to identify some things which I can do.
Read excel file in python using python.
Use pyodbc to validate the values.
Can pandas help me to refine those steps. Rather doing things the hard way.
Yes pandas can help. But you phrase the question in a "please google this for me" way. Expect this question to be down-voted a lot.
I will give you the answer for the excel part. Surely you could have found this yourself with a little effort?
import pandas as pd
df = pd.read_excel('excel_file.xls')
Read the documentation.
Using xlrd module, one can retrieve information from a spreadsheet. For example, reading, writing or modifying the data can be done in Python. Also, a user might have to go through various sheets and retrieve data based on some criteria or modify some rows and columns and do a lot of work.
xlrd module is used to extract data from a spreadsheet.
# Reading an excel file using Python
import xlrd
# Give the location of the file
loc = ("path of file")
# To open Workbook
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
# For row 0 and column 0
sheet.cell_value(0, 0)
put open_workbook under try statement and use pyodbc.Error as exe in except to catch the error if there is any.