Data is lost while extracting data from xls - excel

i have 12000 rows data in xls file, i want to read parse, and insert those data to database. I use extrame/xls library to read xls data,
but some data are different/lost from the actual data from excel.
this is my readXlsFile method :
func readXLSFile(filename string) ([][]string, error) {
result := [][]string{}
log.Println("Get into readXlsFile")
xlFile, err := xls.Open(filename, "utf-8")
if err != nil {
return nil, err
}
sheet1 := xlFile.GetSheet(0)
str := ""
//log.Println("Max Row ", int(sheet1.MaxRow))
for i := 0; i <= (int(sheet1.MaxRow)); i++ {
row1 := sheet1.Row(i)
temp := []string{}
for j := 0; j <= int(row1.LastCol()); j++ {
temp = append(temp, row1.Col(j))
//log.Println("Max Col", int(row1.LastCol()), "Of row ", i+1)
str += fmt.Sprintf("column %d data = %s ", j+1, row1.Col(j))
}
log.Printf("row %d data : %s \n", i+1, str)
str = ""
result = append(result, temp)
}
return result, nil
}
and here is my log that show the different data from my xls file :
2018/03/12 19:24:24 service.inquiry.go:4557: row 1836 data : column 1 data = :61:171218C59000NMSC column 2 data =
2018/03/12 19:24:24 service.inquiry.go:4557: row 1837 data : column 1 data = :86: column 2 data = column 3 data = column 4 data = column 5 data = column 6 data = column 7 data = column 8 data = column 9 data = column 10 data = column 11 data = column 12 data = PLS10299 column 13 data = column 14 data = 22162- column 15 data =
2018/03/12 19:24:24 service.inquiry.go:4557: row 1838 data : column 1 data = :61:171218D300NMSC column 2 data =
2018/03/12 19:24:24 service.inquiry.go:4557: row 1839 data : column 1 data = :86: column 2 data = column 3 data = column 4 data = column 5 data = column 6 data = column 7 data = column 8 data = column 9 data = column 10 data = column 11 data = column 12 data = PLS10299 column 13 data = column 14 data = 22162- column 15 data =
2018/03/12 19:24:24 service.inquiry.go:4557: row 1840 data : column 1 data = :61:171218D700NMSC column 2 data =
2018/03/12 19:24:24 service.inquiry.go:4557: row 1841 data : column 1 data = :86: column 2 data = column 3 data = column 4 data = column 5 data = column 6 data = column 7 data = column 8 data = column 9 data = column 10 data = column 11 data = column 12 data = PLS10299 column 13 data = column 14 data = 22162- column 15 data =
and this is the actual data from xls file :
does anybody know why is this happening, and how to fix it?

Related

How can I count unique values and groupby?

I have been trying to count and group per row the number of unique values. Perhaps will be easier to explain showing a table. I should first transpose before counting and groupby??
Box1
Box2
Box3
Count Result 1
Count Result 2
Count Result 3
Data A
Data A
Data B
Data A = 2
Data B = 1
Data C
Data D
Data B
Data C = 1
Data D = 1
Data B = 1
in GS try:
=ARRAYFORMULA(TRIM(SPLIT(FLATTEN(QUERY(QUERY(
QUERY(SPLIT(FLATTEN(A2:C3&" = ×"&ROW(A2:C3)), "×"),
"select max(Col1) group by Col1 pivot Col2")&
QUERY(SPLIT(FLATTEN(A2:C3&" = ×"&ROW(A2:C3)), "×"),
"select count(Col1) group by Col1 pivot Col2")&"​",
"offset 1", ),,9^9)), "​")))

How to create Excel file from two SQLite tables using condition in pandas?

I have two SQLite tables as Sqlite Table 1 and Sqlite Table 2.
Table1 has ID,Name and Code columns. Table2 has ID, Values and Con columns.
I want to create Excel as ID,Name,Code and Values Columns. ID,Name and Code columns comes from Table1 and Values column comes from table2 with sum value of Values column of table2 with two conditions are ID columns should be match and Con column satisfied with Done Value.
Below image is for reference:
I would approach this problem in steps.
First extract the sql tables into pandas dataframes. I am no expert on that aspect of the problem, but assuming you have two dataframes like the following:
df1 = ID Name Code
0 1 a 1a
1 2 b 2b
2 3 a 3c
and
df2 = ID Values Con
0 1 5 Done
1 2 9 No
2 1 7 Done
3 2 4 No
4 1 8 No
5 3 1 Done
def sumByIndex(dx, row):
# return sum value or 0 if ID doesn't exist
idx = row['ID']
st = list(dx['ID'])
if idx in st:
return dx[dx['ID'] == idx]['Values'].values[0]
else:
return 0
def combineFrames(d1, d2):
#Return updated version of d1 with "Values" column added
d3 = d2[d2['Con'] == 'Done'].groupby('ID', as_index= False).sum()
d1['Values'] = d1.apply( lambda row: sumByIndex(d3, row), axis = 1)
return d1
then print(combineFrames(df1, df2)) yields:
ID Name Code Values
0 1 a 1a 12
1 2 b 2b 0
2 3 a 3c 1
  My program obtains the data from sqllite table 1 and sqlite table 2 in the form of lists (tuples and lists) with the corresponding values of ID, Name, Code and ID, Values, Con by making the request to the database like this 'SELECT * FROM sqlite table 1'
# sqlite table 1
table1 = [[5674, 'a', '1a'], [3385, 'b', '2b'], [5548, 'a', '3c']]
# sqlite table 2
table2 = [(5674, 5, 'Done'), (3385, 9, 'No'), (5674, 7, 'Done'), (3385, 4, 'No'), (5674, 8, 'No'), (5548, 1, 'Done')]
  To begin I will add all the values Values in a dictionary that matches it with the corresponding ID
map_values = {table2[i][0]:0 for i in range(len(table2))}
for i in range(len(table2)):
if (table2[i][2] == 'Done'):
map_values[table2[i][0]] += table2[i][1]
then I define the pandas.DataFrame() instance using sqlite table 1 by this way:
df = pd.DataFrame(table1, index=[i for i in range(1, len(table1)+1)], columns=["ID", "Name", "Code"])
also the values of "Values" are stored in that order to later be added with a new Values column.
df["Values"] = list(map_values.values())
output:
ID Name Code Values
1 5674 a 1a 12
2 3385 b 2b 0
3 5548 a 3c 1
excel:
df.to_excel(r'./excel_file.xlsx', index=False)

how to use multiple columns values to calculate the result in pandas pivot table

def WA(A,B):
c = A*B
return c.sum()/A.sum()
sample = pd.pivot_table(intime,index=['year'], columns = 'DV', values = ['A','B'],
aggfunc={'A' : [len,np.sum], 'B' : [WA]}, fill_value = 0)
I'm grouping the dataframe by year and wanted to find the weighted avg of column B
I'm supposed to multiple column A with B then sum up the result and divide it by the sum of A [functian WA() does that]
I really have no idea on how to call the function by passing both the values

xlsxwriter - Looping in value from database and storing in spreadsheet

I have a table that stores column definition as listed below:
Col Name : store_name
Definition : name
Col Name : store_location
Definition : location
Table structure:
store_name,store_location
name,location
I am trying to have these values displayed in an excel spreadsheet using the below loop:
cursor = This queries the table that stores the above info
title_def = [i[0] for i in cursor.description]
row = 5
col = 2
for data in title_def:
worksheet1.write(row, col, data, header_format)
row += 1
The above loop only prints out the label. I am not sure how to modify the title_def above as I believe I am only filtering out the header and that gets displayed in the sheet using xlsxwriter. Could anyone advice how could I display both col_name and definition in the same spreadsheet..
# Loop through cells in Excel and print values
from openpyxl import load_workbook
workbook = load_workbook('C:\\your_path\\ExcelFile.xlsx')
sheet = workbook.active
row_count = sheet.max_row
for i in range(row_count):
print(sheet.cell(row=i, column=1).value)
# And if you want to do the same with a CSV file
import csv
with open('C:\\your_path\\CSVFile.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)

For loop for dropping a string pattern from a column name

I am attempting to drop '_Adj' from a column name, in a 'df_merged' data frame if (1) a column name contains 'eTIV' or "eTIV1'.
for col in df_merged.columns:
if 'eTIV1' in col or 'eTIV' in col:
df_merged.columns.str.replace('_Adj', '')
This code seems to be producing the following error:
KeyError: '[] not found in axis'
Here are two options:
Option 1
df_merged.columns = [col.replace('_Adj','') if 'eTIV' in col else col for col in list(df_merged.columns)]
Option 2
df_merged = df_merged.rename(columns={col: col.replace('_Adj','') if 'eTIV' in col else col for col in df_merged.columns})

Resources