I have a .xlsx file and I want to recreate that table in a GUI using a Treeview from TKinter. I have a solution below that gives me the output I want but it's long and I'm not sure if there's a better way to do it. For my application, performance is a concern because I don't have that much power and I think with a larger .xlsx file I'll start to see a performance hit.
I also have to assume that I don't know the heading and number of rows but that the number of columns is less than 15
import numpy as np
import pandas as pd
xls = pd.ExcelFile('file.xlsx');
sheetData = pd.read_excel(xls, 'Sheet-1')
# Get column headings
headings = sheetData.columns
# Convert headings to list
data = list(headings.values.tolist())
# Get row count
rows = len(sheetData)
# Create tree with the the number of columns
# equal to the sheet, the id of the column
# equal to the column header and disable
# the 'treeview'
tree = ttk.Treeview(self, columns=data, show=["headings"],selectmode='browse')
# Create column headings on tree
for heading in headings:
heading = str(heading) # convert to string for processing
tree.column(heading, width=125, anchor='center')
tree.heading(heading, text=heading)
# Populate rows --The part that concerns me
for rownumber in range(rows):
rowvalue = sheetData.values[rownumber] # Get row data
rowvalue = np.array2string(rowvalue) # Convert from an np array to string
rowvalue = rowvalue.strip("[]") # Strip the string of square brackets
rowvalue = rowvalue.replace("'",'') # Replace all instances of ' with no character
tree.insert('', 'end', values= rowvalue) # Append the row to table
Is there a simpler way to get row data and append it to a treeview?
I create an easy example to do this:
import tkinter as tk
from tkinter import ttk
import pandas as pd
# Maybe This is what you want
def Start():
fp = pd.read_excel("./test.xlsx") # Read xlsx file
for _ in range(len(fp.index.values)): # use for loop to get values in each line, _ is the number of line.
tree.insert('','end',value=tuple(fp.iloc[_,[1,2]].values)) # [_,[1,2]] represents that you will get the values of second column and third column for each line.
win = tk.Tk()
win.wm_attributes('-topmost',1)
# win.geometry("+1300+0")
ttk.Button(win,text="Import it",command=Start).pack()
columns = ("name","gender")
tree = ttk.Treeview(win,show="headings",columns=columns)
tree.column("name",width=100,anchor='center')
tree.column("gender",width=50, anchor="center")
tree.heading("name",text="name")
tree.heading("gender",text="gender")
tree.pack()
win.mainloop()
This is my example excel file:
Result:
Related
I am manipulating .csv files. I have to loop through each column of numeric data in the file and enter them into different lists. The code I have is the following:
import csv
salto_linea = "\n"
csv_file = "02_CSV_data1.csv"
with open(csv_file, 'r') as csv_doc:
doc_reader = csv.reader(csv_doc, delimiter = ",")
mpg = []
cylinders = []
displacement = []
horsepower = []
weight = []
acceleration = []
year = []
origin = []
lt = [mpg, cylinders, displacement, horsepower,
weight, acceleration, year, origin]
for i,ln in zip(range (0,9),lt):
print(f"{i} -> {ln}")
for row in doc_reader:
y = row[i]
ln.append(y)
In the loop, try to have range() serve me as an index so that in the nested for loop, it loops through the first column (the first element of each row in the csv) and feeds it into the first list of 'lt'. The problem I have is that I go through the data column and enter it, but range() continues to advance in the first loop, ending the nesting, thinking that it would iterate i = 1, and that new value of 'i' would enter again. the nested loop traversing the next column and vice versa. I also tried it with some other while loop to iterate a counter that adds to each iteration and serves as an index but it didn't work either.
How I can fill the sublists in 'lt' with the data which is inside the csv file??
without seing the ontents of the CSV file itself, the best way of reading the data into a table is with the pandas module, which can be done in one line of code.
import pandas as pd
df = pd.read_csv('02_CSV_data1.csv')
this would have read all the data into a dataframe and you can work with this.
Alternatively, ammend the for loop like this:
for row in doc_reader:
for i, ln in enumerate(lt):
ln.append(row[i])
for bigger data, i would prefer pandas which has vectorised methods.
I have been working on a project that automates some Excel calculations. Somewhere in my code (I cannot share it because of confidentiality) I need to retrieve the coordinates of cells from which I know the values.
I know how to find the values, when I have the coordinates, but how does the other way round work?
For example...
A1 = 5, B1 = 6
Your code should be:
from openpyxl import Workbook
path = "C:\\Users\\Admin\\Desktop\\demo.xlsx"
# workbook object is created
wb_obj = openpyxl.load_workbook(path)
sheet_obj = wb_obj.active
sheet_obj.cell(row = 2, column = 2) = sheet_obj.cell(row = 1, column = 1) + sheet_obj.cell(row = 1, column = 2)
x = sheet_obj.cell(row = 2, column = 2)
print(x.value) #that can be access
wb_obj.save('logo.xlsx')
Now you can access the value that's in x
Following is my solution (works for string search values)
Contents of my file ('Book1.xlsx'):
Image of my data in the file
###############################################################
# Import openpyxl
# Note: openpyxl package provides both read and write capabilities to excel
import openpyxl
from openpyxl.utils import get_column_letter
# Class definitions should use CamelCase convention based on pep-8 guidelines
class CustomOpenpyxl:
# Initialize the class with filename as only argument
def __init__(self, _my_file_name):
assert _my_file_name.split('.')[-1] == 'xlsx', 'Input file is not xlsx'
self.my_filename = _my_file_name
self.my_base_wb = openpyxl.load_workbook(self.my_filename, read_only=False)
# following line will set the last worksheet in the workbook as active
self.my_base_active_ws = self.my_base_wb.active
# Method to get values for specific row in a given worksheet
# Argument to this method is: - Row number of values to be fetched
def get_specific_row_val_as_list_in_active_ws(self, _val_row_num):
for _col in self.my_base_active_ws.iter_cols(min_col=1, max_col=1):
# Iterate once for the specific row number in active worksheet
for _row in self.my_base_active_ws.iter_rows(min_row=_val_row_num, max_row=_val_row_num):
# Return a list of values
return [_cell.value for _cell in _row]
# Method to get cell coordinate by a search value
# Argument to this method is:- search string
# Assumption cell value is unique
def get_cell_coordinate_by_value(self, _search_val):
# List comprehension to get the row index based on the search value
_row_processor = [_row_idx for _row_idx, _main_rec in enumerate(self.my_base_active_ws.values, start=1) if
_search_val in _main_rec]
# return type is a list, hence following line to assign it to variable and manage the data type later
_row_idx = _row_processor[-1]
# Get the value of the entire row and fetch the column index
_col_processor = [_col_idx for _col_idx, _val in
enumerate(self.get_specific_row_val_as_list_in_active_ws(int(_row_idx)), start=1) if
_val == _search_val]
# retrun type is a list, hence following line to assign it to variable and manage the data type later
_col_idx = _col_processor[-1]
# get the column letter
_col_letter = get_column_letter(int(_col_idx))
# string concatenation to join column letter and row index
_cell_address = _col_letter + str(_row_idx)
return _cell_address
_my_file_name = 'Book1.xlsx'
# Instantiate an object using the newly created class in this code block
# So that this object gets access to all methods in the class
_my_src_obj = CustomOpenpyxl(_my_file_name)
print(_my_src_obj.get_cell_coordinate_by_value('BAGS'))
print(_my_src_obj.get_cell_coordinate_by_value(1000))
####################################################################PRINT Result################################################################################
B2
C4
Process finished with exit code 0
Apparently, my application can display the excel file but it is a bit messy without border for the table.
import pandas as pd
import xlrd
import tkinter as tk
from tkinter import*
from tkinter import ttk, filedialog
root = tk.Tk()
root.title("My Application")
width = 1000
height = 500
def browseFile():
global workbook, copyWorkbook, excel_file, sheetName, worksheet, df_table
fileName = filedialog.askopenfilename(initialdir = '/', title = 'New File', filetypes = (('excel file', '.xlsx'), ('excel file', '.xls'), ('all files', '*.*')))
excel_file = pd.ExcelFile(fileName)
workbook = xlrd.open_workbook(fileName)
sheetCount = workbook.nsheets
sheetName = []
tab = []
for x in range(workbook.nsheets):
tab.append(ttk.Frame(tabControl))
sheetName = workbook.sheet_names()
tabControl.add(tab[x], text = sheetName[x])
df_table = excel_file.parse(sheetName[x])
lblTable = Label(tab[x], text = df_table.to_string(index = False)).pack()
toolbar = Frame(root)
btnOpen = Button(toolbar, text = "Open", command = browseFile).pack(side = LEFT)
btnQuit = Button(toolbar, text = "Quit", command = root.quit).pack(side = RIGHT)
toolbar.pack(side = TOP, fill = X)
tabControl = ttk.Notebook(root)
tabHome = ttk.Frame(tabControl)
tabControl.pack(expand = 1, fill = 'both', side = LEFT)
root.mainloop()
I have tried search statements that can display the table with border, but no result found. How can I add border to the table? Is it possible to add border? If not, what other method that I can use?
The problem
Your problem here is that, your code puts the entire dataframe into a single Label widget. When I tried putting a border around this label, it appeared around the whole dataframe, rather than around each cell.
My solution
My solution to this is to go through all of the rows and create an individual Label widget for each cell, giving each of these its own border. I then organised them using .grid() rather than .pack(). For the border, I used the settings borderwidth=2, relief="ridge", but you can choose others.
I also added a feature that sets the width of each column to its longest value to prevent the label's contents overflowing.
I did not include anything to include row and column headers, but you replacing i.pop(0) with i[0] and changing header=none should add this feature.
Modified code
I have only included the for loop in your browseFile() function as I have not made any changes to the rest of your code.
for x in range(workbook.nsheets):
tab.append(ttk.Frame(tabControl))
sheetName = workbook.sheet_names()
tabControl.add(tab[x], text = sheetName[x])
df_table = excel_file.parse(sheetName[x], header=None) # header=None stops the first line of the data table being used as the column header, making it appear in the data table
# Iterates through each row, creating a Label widget for each cell on that row.
for i in df_table.itertuples():
i=list(i) # Converts the row to a more helpful list format
row = i.pop(0) # pop returns value at position, then removes it, as we don't wan't first value of tuple (row number) in spreadsheet
for j in range(len(i)):
# Next two lines get the current column in a list format and convert all items to a string, in order to determine the longest item to make all label widgets the correct width.
current_column = list(df_table.iloc[:, j])
for k in range(len(current_column)): current_column[k] = str(current_column[k])
# Makes label widget
lbl = Label(tab[x], text = i[j], borderwidth=2, relief="ridge", width=len(max(current_column, key=len)), justify=LEFT).grid(row=row, column=j+1)
PS: I really enjoyed solving this question. Thanks for asking!
I would like to check a "worksheet" if it contains more than e.g. 250 entries if it does I would create a new excel-sheet and save it in a new file.
For example:
Leading-Zip: Adresses that contains the Leading-Zip:
--------------------------
74 400
73 200
72 50
I used this command to get the number of entries I want to group:
worksheet['Zip-code-region'].value_counts()
Which way do I have to choose to make that?
Do i have to create a list? or could I use a command with a for-loop?
Try a Update:
I am importing a excelfile:
xel = pd.read_excel(r'C:test.xlsx', sheet_name = None)
than i select a sheet:
worksheet = xel[ws]
now I add a new column 'leading-zip' slicing the ZIP code:
worksheet['leading-zip']=worksheet['zip-code'].astype(str).str[:2].astype(int)
from that 'leading-zip' I want to iterate each 'leading-zip' - count the adresses contained in it and if they are more than 250 I want to create a new excel file.
You can filter the value_counts results that are above the threshold and then loop over their indexes, saving the respective subsets from the original DataFrame as separate Excel sheets:
import xlsxwriter
import numpy as np
import pandas as pd
df = pd.DataFrame({'zip': np.random.randint(10, 100, 1000)})
z = df['zip'].value_counts()
threshold = 15
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
for i in z[z >= threshold].index:
df[df['zip'] == i].to_excel(writer, str(i))
# save the remaining data as worksheet 'other':
df[df['zip'].isin(z[z < threshold].index)].to_excel(writer, 'other')
writer.save()
I want to write values from a dataframe into a tkinter treeview/Table, I am not able to do this.
my code:
#Setting up tkinter window.
root = Tk()
tree = ttk.Treeview(root)
#taking file input through a dialog box from the user.
file = filedialog.askopenfile(parent=root,mode='rb',title='Choose a xlsx file')
#readinf the excel file selected by the user and then creating a dataframe of that file.
xls = pd.read_excel(file)
df = pd.DataFrame(xls)
#taking all the columns heading in a variable"df_col".
df_col = df.columns.values
#all the column name are generated dynamically.
tree["columns"]=(df_col)
counter = len(df)
#generating for loop to create columns and give heading to them through df_col var.
for x in range(len(df_col)):
tree.column(x, width=100 )
tree.heading(x, text=df_col[x])
#generating for loop to print values of dataframe in treeview column.
for i in range(counter):
tree.insert('', 0, values=(df[df_col[x]]][i]))
It is not printing the columns and showing the KeyError:0.
Output Required:
The first argument of tree.column() should be the column name, which you assigned with:
tree["columns"]=(df_col)
The problem is that you have named the columns using a string, but you are attempting to access them using integers in:
for x in range(len(df_col)):
tree.column(x, width=100 )
tree.heading(x, text=df_col[x])
Above, you are attempting to access tree.columns(0), instead of tree.columns('Company'), hence the key error.
Try instead:
for x in range(len(df_col)):
tree.column(df_col[x], width=100)
tree.heading(df_col[x], text=df_col[x])
Note that df_col is an ndarray, not a dataframe, which is why df_col[x] works correctly (df[x] would give a key error). This is because df.columns.values returns an ndarray. As a side note, it may be a bit confusing to name an ndarray df_col.
There are also a few issues with your insert. The second argument should correspond to the index of the entry you wish to address. One solution is then to use a row index as the second argument, followed by a row label as text="rowLabel", followed by a list of values for the row:
tree.insert('', i, text=rowLabels[i], values=df.iloc[i,:].tolist())
Where rowLabels should be defined as whatever you want to use in the first column of the table. I would suggest using an index column from the spreadsheet here, if possible. It could be defined by:
rowLabels = df.iloc[:,indexColumn].tolist()
or:
rowLabels = df.index.tolist()
The latter is viable if df has named indices defined by a column during the spreadsheet import. In the former, indexColumn is an int referring to a column number in df that contains unique identifiers.
The option values=df.iloc[i,:].tolist() converts all columns of the ith row into a list, and, since we have passed an index value (the second argument) that gets larger, the call will insert a new row every loop (from the python tkinter docs entry on Treeview --> insert: "if index is greater than or equal to the current number of children, it is inserted at the end").
Finally, I am not sure if you did not post the end of your code, but, in order for the tree to show up, you will also need to use pack, grid, etc.
tree.pack()
or
tree.grid(row=0, column=0)
References:
https://docs.python.org/3/library/tkinter.ttk.html#tkinter.ttk.Treeview
This helpful example makes a few of the steps clear:
https://knowpapa.com/ttk-treeview/
As I was reading over your code. I noticed at the end line you have an extra bracket #:
df[df_col[x]]]
for i in range(counter):
tree.insert('', 0, values=(df[df_col[x]]][i]))
I would assume that would explain the KeyError.