I have a table that stores column definition as listed below:
Col Name : store_name
Definition : name
Col Name : store_location
Definition : location
Table structure:
store_name,store_location
name,location
I am trying to have these values displayed in an excel spreadsheet using the below loop:
cursor = This queries the table that stores the above info
title_def = [i[0] for i in cursor.description]
row = 5
col = 2
for data in title_def:
worksheet1.write(row, col, data, header_format)
row += 1
The above loop only prints out the label. I am not sure how to modify the title_def above as I believe I am only filtering out the header and that gets displayed in the sheet using xlsxwriter. Could anyone advice how could I display both col_name and definition in the same spreadsheet..
# Loop through cells in Excel and print values
from openpyxl import load_workbook
workbook = load_workbook('C:\\your_path\\ExcelFile.xlsx')
sheet = workbook.active
row_count = sheet.max_row
for i in range(row_count):
print(sheet.cell(row=i, column=1).value)
# And if you want to do the same with a CSV file
import csv
with open('C:\\your_path\\CSVFile.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
Related
How can I find the first column's distict values from sheet3 ?
import sys
import pandas as pd
excel_file = "dataset.xlsx"
datasets = pd.ExcelFile(excel_file)
sheet0 = pd.read_excel(datasets, 'Title Sheet')
sheet3 = pd.read_excel(excel_file, sheet_name=3, index_col=0)
sheet4 = pd.read_excel(excel_file, sheet_name=4, index_col=0)
sheet1 = pd.read_excel(excel_file, sheet_name=1, index_col=0)
customer_data = pd.concat([sheet3, sheet4, sheet1])
To get the unique values of the specific sheet, you can do:
unq = sheet3.iloc[:,0].unique()
print(unq)
To get the count of unique values, you can do:
unq = sheet3.iloc[:,0].nunique()
print(unq)
I have a workbook named CustomerSales with two Excel worksheets named sheet1 and sheet2 in both sheets I need to write/change values in particular column values so I used.
if (data.at[g, 'failedColumn'] == '' and data.at[g, 'reason'] == ''):
data.at[g, 'status'] = 'Fail'
data.at[g, 'failedColumn'] = 'BUKRS'
data.at[g, 'reason'] = 'Customer Not Extended To Any Company code'
data.to_excel(variable)//*variable-path to excel file
Here data is Dataframe of sheet1 here the code works perfectly fine resultexcel file has updated column values
But when I am trying above code with dataframe of sheet2 existing sheet1 data get replaced by sheet2. Is there a way I can change both sheets values.
In order to manipulate the data in both of the sheets you should read the file with pandas, create 2 separate dataframes (one for each sheet) and then save them in the same workbook using xlsxwriter. Here is a demo code:
import pandas as pd
## Read the 2 sheets as separate dataframes
df1 = pd.read_excel('name_of_your_file.xlsx', sheet_name='Sheet1')
df2 = pd.read_excel('name_of_your_file.xlsx', sheet_name='Sheet2')
# Do all of your data manipulation here
# Start using xlsxwriter
writer = pd.ExcelWriter('name_of_the_new_file.xlsx', engine='xlsxwriter')
# Save each df to separate sheet in the same file
df1.to_excel(writer, sheet_name='Sheet1', index=False)
df2.to_excel(writer, sheet_name='Sheet2', index=False)
# You can format your worksheets here
# Finally save the file
writer.save()
I am new to Python and trying to automate some tasks. I have an Excel file with 8 sheets where each sheet has some identifier on top followed below that are tabular data with headers. Each sheet has the identifiers of interest and the tables in the same location.
What I want to do is to extract some data from the top of each sheet and insert them as columns, remove unwanted rows(after I have assigned some of them to columns) and columns and then merge into one CSV file as output.
The code I have written does the job. My code reads in each sheet, performs the operations on the sheet, then I start the same process for the next sheet (8 times) before using .concat to merge them.
import pandas as pd
import numpy as np
inputfile = "input.xlsx"
outputfile = "merged.csv"
##LN X: READ FIRST SHEET AND ASSIGN HEADER INFORMATION TO COLUMNS
df1 = pd.read_excel(inputfile, sheet_name=0, usecols="A:N", index=0)
#Define cell locations of fields in the header area to be assigned to
columns
#THIS CELL LOCATIONS ARE SAME ON ALL SHEETS
A = df1.iloc[3,4]
B = df1.iloc[2,9]
C = df1.iloc[3,9]
D = df1.iloc[5,9]
E = df1.iloc[4,9]
#Insert well header info as columns in data for worksheet1
df1.insert(0,"column_name", A)
df1.insert(1,"column_name", B)
df1.insert(4,"column_name", E)
# Rename the columns in `enter code here`worksheet1 DataFrame to reflect
actual column headers
df1.rename(columns={'Unnamed: 0': 'Header1',
'Unnamed: 1': 'Header2', }, inplace=True)
df_merged = pd.concat([df1, df2, df3, df4, df5, df6, df7,
df8],ignore_index=True, sort=False)
#LN Y: Remove non-numerical entries
df_merged = df_merged.replace(np.nan, 0)
##Write results to CSV file
df_merged.to_csv(outputfile, index=False)
Since this code will be used on other Excel files with varying numbers of sheets, I am looking for any pointers on how to include the repeating operations in each sheet in a loop. Basically repeating the steps between LN X to LN Y for each sheet (8 times!!). I am struggling with how to use a loop function Thanks in advance for your assistance.
df1 = pd.read_excel(inputfile, sheet_name=0, usecols="A:N", index=0)
You should change the argument sheet_name to
sheet_name=None
Then df1 will be a dictionary of DataFrames. Then you can loop over df1 using
for df in df1:
df1[df].insert(0,"column_name", A)
....
Now perform your operations and merge the dfs. You can loop over them again and concatenate them to one final df.
I have a dataframe with number of sheets,i wants to delete duplicate from all sheets.i used below code
df = df.drop_duplicates(subset='Month',keep='last')
after that i save this df
df.to_excel(path,index=False)
but its removing only 1st sheet duplicate and showing only one sheet
I would suggest treating each sheet of your document as an separate data frame, then in iteration remove the duplicates of each set according to your criteria. This is quick draft of concept I had on mind, for 2 sheets:
xls = pd.ExcelFile('myFile.xls')
xls_dfs = []
df1 = pd.read_excel(xls, 'Sheet1')
xls_dfs.append(df1)
df2 = pd.read_excel(xls, 'Sheet2')
xls_dfs.append(df2)
for df in xls_dfs:
df = df.drop_duplicates(subset='Month',keep='last')
df.to_excel('myFile.xls',index=False)
I have a .csv file that I'm reading from. I read only select columns from it and I need to further process this data before I save it into an excel sheet. The idea is to repeat this process for all the files in the folder and save the sheets with the same names as the original .csv.
As of now, I'm able to read the specific columns from .csv and write the whole file into excel. I am yet to figure out how to further process these columns before I save to excel. Further processing involves
Averaging rows 18000-20000 for each column separately.
Calculating (Column value - Average)/Average
Saving these values in separate columns with different column names.
My code is as follows. Need some help with this.
import pandas as pd
import os
from pathlib import Path
for f in os.listdir():
file_name, file_ext = os.path.splitext(f) #splitting into file name and extension
if file_ext == '.atf':
#open the data file and get data only from specific columns.
df = pd.read_csv(f, header = 9, index_col = 0, usecols = [0,55,59,63,67,71,75,79,83,87,91,95,99,103], encoding = "ISO-8859-1", sep = '\t', dtype = {'YFP.13':str,'YFP.14':str,'YFP.15':str,'YFP.16':str,'YFP.17':str,'YFP.18':str,'YFP.19':str,'YFP.20':str,'YFP.21':str,'YFP.22':str,'YFP.23':str,'YFP.24':str,'YFP.25':str,'Signals=':str})
df.to_excel(file_name+'.xlsx',sheet_name=file_name, engine = 'xlsxwriter') #writing into an excel file
Let's say your dataframe has a shape of 4 columns and 100 rows:
data = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
Averaging rows 18000-20000 for each column separately.
To perform the averaging on a subset, you define a logical mask based on inequalities over the index and apply the averaging function on the selected dataframe. The results are saved in a new dataframe as you want to use them later:
means = data[(60<data.index) & (80>data.index)].mean()
Calculating (Column value - Average)/Average
Saving these values in separate columns with different column names
For the two last steps, the code below speaks by itself:
cols = data.columns
for col in cols:
data[col +"_calc"] = (data[col]-means[col])/means[col]
In the end, you may export the dataframe "data" to an excel format as you did earlier.