Query regarding pandas - excel

Even after installing xlrd module, I am not able to read excel files using pandas, every time it's showing file directory not found. Please help!
I am using " import Pandas as pd"
" data=pd.read_excel("notebook.xlsx")
It shows error as file not found

Pandas is not finding the excel file. Try to put the complete path on the read_excel function like read_excel("C:/documents/notebook.xlsx").

Related

Not being able to write a data frame to a excel sheet

This is the error message i am getting .Even on trying the code given by my tutor and online i tried fixing but ended up being not able to fix the problem
You're requesting Pandas to use the xlsxwriter engine to write your XLSX file. The error says
ModuleNotFoundError: No module named 'xlsxwriter'
To use the xlsxwriter engine, you'll need to install the xlsxwriter module.
Usually, that's done with pip install xlsxwriter.

Pandas: how to make openpyxl the default engine for all read_excel operations?

Since read_exceldefault engine xlrd has been deprecated in newer pandas releases, how do I make openpyxl the default engine of all my pd.read_excel calls?
Now, if I update pandas, I must put the parameter engine="openpyxl" in all my pd.read_excel calls. It looks unnecessary.
It's easy! You can do it by changing the default values of the method by going to the _base.py inside the environment's pandas folder. You can find it as follows:
import pandas as pd
print(pd.__file__)
Once in the pandas folder, dive into the folder io > excel > _base.py
Open the file and find
def read_excel(...)
You will find the default value for engine. Change it to 'openpyxl'
If you're using vscode, simply right click on the instance of the method .read_excel and press F12, or go to the definition and change it right away.
If you're using pandas version 1.1.5 or other new version this might help:
Run print(pd.__file__) to see where your pandas library is stored. Usually, the file path would end in "Lib\site-packages\pandas". Then, open folder "io" and folder "excel" right after. You will find there the "_base.py" file.
Look for the def __init__. You will find there the default engine to read Excel files. It should be on line 849. I should read:
if engine is None:
engine = "openpyxl"

Write output in xlsb file format (Excel binary file format) using pandas and pyxlsb

I've read a lot of stackoverflow and other threads where it's been mentioned how to read excel binary file.
Reference: Read XLSB File in Pandas Python
import pandas as pd
df = pd.read_excel('path_to_file.xlsb', engine='pyxlsb')
However, I can not find any solution on how to write it back as .xlsb file after processing using pandas? Can anyone please suggest a workable solution for this using python?
Any help is much appreciated!
I haven't been able to find any solution to write into xlsb files or create xlsb files using python.
But maybe one work around is to save your file as xlsx using any of the many available libraries to do that (such as pandas, xlsxwriter, openpyxl) and then converting that file into a xlsb using xlsb-converter. https://github.com/gibz104/xlsb-converter
CAUTION: This repository uses WIN32COM, which is why this script only supports Windows
you can read binary file with open_workbook under pyxlsb. Please find below the code:
import pandas as pd
from pyxlsb import open_workbook
path=r'D:\path_to_file.xlsb'
df2=[]
with open_workbook(path) as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df2.append([item.v for item in row])
data= pd.DataFrame(df2[1:], columns=df2[0])

Autofill a word.docm file using python

I am trying to auto-fill a word.docm file using python.
I could find solution to automate word.docx filling. But with word.docm files, the code fails.
This is how the sample 'template.docm' file looks like.
I need to fill field1 and field2 from python.
I am attaching the code,that worked perfectly for docx files. Can anyone please suggest any edits or any other method that works for word.docm files using python?
Thanks in advance
from __future__ import print_function
from mailmerge import MailMerge
from datetime import date
template = "template.docm"
document = MailMerge(template)
print(document.get_merge_fields())
document.merge(field1='Name',field2='Address')
document.write('template-output.docm')

How to import .dta via pandas and describe data?

I am new to python and have a simple problem. In a first step, I want to load some sample data I created in Stata. In a second step, I would like to describe the data in python - that is, I'd like a list of the imported variable names. So far I've done this:
from pandas.io.stata import StataReader
reader = StataReader('sample_data.dta')
data = reader.data()
dir()
I get the following error:
anaconda/lib/python3.5/site-packages/pandas/io/stata.py:1375: UserWarning: 'data' is deprecated, use 'read' instead
warnings.warn("'data' is deprecated, use 'read' instead")
What does it mean and how can I resolve the issue? And, is dir() the right way to get an understanding of what variables I have in the data?
Using pandas.io.stata.StataReader.data to read from a stata file has been deprecated in pandas 0.18.1 version and hence you are getting that warning.
Instead, you must use pandas.read_stata to read the file as shown:
df = pd.read_stata('sample_data.dta')
df.dtypes ## Return the dtypes in this object
Sometimes this did not work for me especially when the dataset is large. So the thing I propose here is 2 steps (Stata and Python)
In Stata write the following commands:
export excel Cevdet.xlsx, firstrow(variables)
and to copy the variable labels write the following
describe, replace
list
export excel using myfile.xlsx, replace first(var)
restore
this will generate for you two files Cevdet.xlsx and myfile.xlsx
Now you go to your jupyter notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Cevdet.xlsx')
This will allow you to read both files into jupyter (python 3)
My advice is to save this data file (especially if it is big)
df.to_pickle('Cevdet')
The next time you open jupyter you can simply run
df=pd.read_pickle("Cevdet")

Resources