I am using xlwings to write a dataframe to an excel sheet. Nothing special, and all works perfectly.
xw.view(
dataframe,
abook.sheets.add(after = abook.sheets[-1]),
table=True
)
My issue is that the output excel sheet has filters in the top two rows, which I have to manually disable (by selecting the rows and clearning contents).
Thanks to https://github.com/xlwings/xlwings/issues/679#issuecomment-369138719
I changed my code to the following:
abook = xw.books.active
xw.view(
dataframe,
abook.sheets.add(after = abook.sheets[-1]),
table=True
)
sheetname = abook.active.name
if wb.sheets[sheetname].api.AutoFilterMode == True:
wb.sheets[sheetname].api.AutoFilterMode = False
which looked promising, but it didn't resolve my issue.
I would appreciate any pointers, how I can have the filters turned off by default. I am using the latest xlwings on win 10, 11.
Thanks
The solution was to add the
table=False
parameter to the xw.view(df) method. According to the docs:
table (bool, default True) – If your object is a pandas DataFrame, by default it is formatted as an Excel Table
Now to write a dataframe df, I call:
import xlwings as xw
import pandas as pd
df = pd.DataFrame(...)
xw.view(df, table=False)
Updated on 14 January 2023:
Just for completeness, using the argument table=True in view adds a table with a filter. If you would like to keep the table, but remove the filter, you can remove the filter with ws.tables[0].show_autofilter = False:
import xlwings as xw
import pandas as pd
df = pd._testing.makeDataFrame()
xw.view(df, table=True)
ws = xw.sheets.active
ws.tables[0].show_autofilter = False
Or with api.AutoFilter(Field=[...], VisibleDropDown=False), whereby Field is a list of integers describing the concerning column numbers:
import xlwings as xw
import pandas as pd
df = pd._testing.makeDataFrame()
xw.view(df, table=True)
ws = xw.sheets.active
ws.used_range.api.AutoFilter(Field=list(range(1, ws.used_range[-1].column + 1)), VisibleDropDown=False)
Related
I am new to python and exploring to get data from excel using it and found pandas library to get data
I need to get the rates from a HTML table on a website. Table from which the data has to be read
Then dump it in an excel file.
I am using Python
I have used the following code
import pandas as pd
from datetime import datetime
import lxml as lx
import openpyxl as oxl
url = "https://www.example.com"
tables = pd.read_html(url)
table = tables[0]
table.to_excel('output.xlsx')
The dates are in dd mmm yyyy format in the 'Effective Date' column
I would like to convert them to the dd/mm/yyyy format
I used the following code to convert the table
['Effective Date'] = pd.to_datetime(table['Effective Date'],
infer_datetime_format=False, format='%d/%m/%Y', errors='ignore')
but it fails to convert the dates in the column. Could someone head me in some proper direction please.
Here is the complete code
import pandas as pd
import html5lib
import datetime
import locale
import pytz
import lxml as lx
import openpyxl as oxl
url = "https://www.rba.gov.au/statistics/cash-rate/"
tables = pd.read_html(url)
table = tables[0]
table['Effective Date'] = pd.to_datetime(table['Effective Date'],
infer_datetime_format=False, format='%d/%m/%Y', errors='ignore')
table.to_excel('rates.xlsx')
You need to use pd.ExcelWriter to create a writer object, so that you can change to Date format WITHIN Excel; however, this problem has a couple of different aspects to it:
You have non-date values in your date column, including "Legend:", "Cash rate decreased", "Cash Rate increased", and "Cash rate unchanged".
As mentioned in the comments, you must pass format='%d %b %Y' to pd.to_datetime() as that is the Date format you are converting FROM.
You must pass errors='coerce' in order to return NaT for those that don't meet the specified format
For the pd.to_datetime() line of code, you must add .dt.date at the end, because we use a date_format parameter and not a datetime_format parameter in creating the writer object later on. However, you could also exclude dt.date and change the format of the datetime_format parameter.
Then, do table = table.dropna() to drop rows with any columns with NaT
Pandas does not change the Date format WITHIN Excel. If you want to do that, then you should use openpyxl and create a writer object and pass the date_format. In case someone says this, you CANNOT simply do: pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.strftime('%m/%d/%y') or .dt.strftime('%d/%m/%y'), because that creates a "General" date format in EXCEL.
Output is ugly if you do not widen your columns, so I've included code for that as well. Please note that I am on a USA locale, so passing d/m/yyyy creates a "Custom" format in Excel.
NOTE: In my code, I have to pass m/d/yyyy in order for a "Date" format to appear in EXCEL. You can simply change to date_format='d/m/yyyy' since my computer has a different locale than you (USA) that Excel utilizes for "Date" format.
Source + More on this topic:
import pandas as pd
import html5lib
import datetime
import locale
import pytz
import lxml as lx
import openpyxl as oxl
url = "https://www.rba.gov.au/statistics/cash-rate/"
tables = pd.read_html(url)
table = tables[0]
table['Effective Date'] = pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.date
table = table.dropna()
table.to_excel('rates.xlsx')
writer = pd.ExcelWriter("rates.xlsx",
engine='xlsxwriter',
date_format='m/d/yyyy')
# Convert the dataframe to an XlsxWriter Excel object.
table.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.set_column('B:E', 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
I am new to Pandas in Python and I am having some difficulties returning the second column of a dataframe without column names just numbers as indexes.
import pandas as pd
import os
directory = 'A://'
sample = 'test.txt'
# Test with Air Sample
fileAir = os.path.join(directory,sample)
dataAir = pd.read_csv(fileAir,skiprows=3)
print(dataAir.iloc[:,1])
The data I am working with would be similar to:
data = [[1,2,3],[1,2,3],[1,2,3]]
Then, using pandas I wanted to have only
[[2,2,2]].
You can use
dataframe_name[column_index].values
like
df[1].values
or
dataframe_name['column_name'].values
like
df['col1'].values
import numpy as np
import pandas as pd
dfs = pd.read_excel('input.xlsx', sheet_name=None,header=None)
tester=dfs['Sheet1'].values.tolist()
keys = list(zip(*tester))[0]
seen = set()
seen_add = seen.add
keysu= [x for x in keys if not (x in seen or seen_add(x))]
values = list(zip(*tester))[1]
a = np.array(values).reshape(int(len(values)/len(keysu)),len(keysu))
list1=[keysu]
for i in a:
list1.append(list(i))
df=pd.DataFrame(list1)
df.to_excel('output.xlsx',index=False,header=False)
I want to copy one excel column data to another excel row data using python
want to execute and run
I'm trying to pick a particular column from a csv file using Python's Pandas module, where I would like to fetch the Hostname if the column Group is SJ or DC.
Below is what I'm trying but it's not printing anything:
import csv
import pandas as pd
pd.set_option('display.height', 500)
pd.set_option('display.max_rows', 5000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 500)
low_memory=False
data = pd.read_csv('splnk.csv', usecols=['Hostname', 'Group'])
for line in data:
if 'DC' and 'SJ' in line:
print(line)
The data variable contains the values for Hostname & Group columns as follows:
11960 NaN DB-Server
11961 DC Sap-Server
11962 SJ comput-server
Note: while printing the data it stripped the data and does not print complete data.
PS: I have used the pandas.set_option to get the complete data on the terminal!
for line in data: doesn't iterate over row contents, it iterates over the column names. Pandas has several good ways to filter columns by their contents.
For example, you can use df.Series.isin() to select rows matching one of several values:
print data[data['Group'].isin(['DC', 'SJ'])]['Hostname']
If it's important that you iterate over rows, you can use df.iterrows():
for index, row in data.iterrows():
if row['Group'] == 'DC' or row['Group'] == 'SJ':
print row['Hostname']
If you're just getting started with Pandas, I'd recommend trying a tutorial to get familiar with the basic structure.
Try this:
import csv
import pandas as pd
import numpy as np #You can comment numpy as it is not needed.
low_memory=False
data = pd.read_csv('splnk.csv', usecols=['Hostname', 'Group'])
hostnames = data[(data['Group']=='DC') | (data['Group']=='SJ')]['Hostname'] # corrected the `hostname` to `Hostname`
print(hostnames)
I have a large excel file which I have imported into pandas, made up of 92 sheets.
I want to use a loop or some tool to generate dataframes from the data in each spreadsheet (one dataframe from each spreadsheet), which also automatically names each dataframe.
I have only just started using pandas and jupyter so I am not very experienced at all.
This is the code I have so far:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime
%matplotlib inline
concdata = pd.ExcelFile('Documents/Research Project/Data-Ana/11July-27Dec.xlsx')
I also have a list of all the spreadsheet names:
#concdata.sheet_names
Thanks!
Instead of making each DataFrame its own variable you can assign each sheet a name in a Python dictionary like so:
dfs = {}
for sheet in concdata.sheet_names:
dfs[sheet] = concdata.parse(sheet)
And then access each DataFrame with the sheet name:
dfs['sheet_name_here']
Doing it this way allows you to have amortised O(1) lookup of sheets.