While Converting excel to dataset using excelreader.Asdataset(), Sometimes after read an empty cell in excel, the next cell is read as System.DBNull - excel

I am converting excel file data to data set using following code
if (String.Compare(Path.GetExtension(filePath), ".xlsx",StringComparison.OrdinalIgnoreCase) == 0){excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream); if (excelReader != null) {
excelReader.IsFirstRowAsColumnNames = true;var dsresult = new DataSet();
try { dsresult = excelReader.AsDataSet(); }}}
But sometimes, after read an empty cell in excel, the next cell is read as System.DBNull
e. g.
data in excel as below
Col A = 1, Col B is blank, Col C = 2
After conversion to data set values in dataset will be
data set values:
Col A = 1, Col B is blank, Col C is blank
After searching it seems that there is some problem with excel reader, Please suggest some proper solution or atleast workaround for this issue
Thanks
Deepak

There seems to be an issue with old versions of Excel Data reader, I had the same issue as you, tried excelreader.Asdataset() and tried also manually looping with excelReader.Read() but I was still getting empty results. As soon as I updated dll to version 2.1 I got rid of the issue.

Related

Change number format using headers - openpyxl

I have an Excel file in which I want to convert the number formatting from 'General' to 'Date'. I know how to do so for one column when referring to the column letter:
workbook = openpyxl.load_workbook('path\filename.xlsx')
worksheet = workbook['Sheet1']
for row in range(2, worksheet.max_row+1):
ws["{}{}".format(ColNames['Report_date'], row)].number_format='yyyy-mm-dd;#'
As you can see, I now use the column letter "D" to point out the column that I want to be formatted differently. Now, I would like to use the header in row 1 called "Start_Date" to refer to this column. I tried a method from the following post to achieve this: select a column by its name - openpyxl. However, that resulted in a KeyError: "Start_Date":
# Create a dictionary of column names
ColNames = {}
Current = 0
for COL in worksheet.iter_cols(1, worksheet.max_column):
ColNames[COL[0].value] = Current
Current += 1
for row in range(2, worksheet.max_row+1):
ws["{}{}".format(ColNames['Start_Date'], row)].number_format='yyyy-mm-dd;#'
EDIT
This method results in the following error:
AttributeError: 'tuple' object has no attribute 'number_format'
Additionally, I have more columns from which the number formatting needs to be changed. I have a list with the names of those columns:
DateColumns = ['Start_Date', 'End_Date', 'Birthday']
Is there a way that I can use the list DateColumns so that I can save some lines of code?
Thanks in advance.
Please note that I posted a similar question earlier. The following post was referred to as an answer Python: Simulating CSV.DictReader with OpenPyXL. However, I don't see how the answers in that post can be adjusted to my needs.
You need to know which columns you want to change the number format on which you have conveniently put into a list, so why not just use that list.
Get the headers in your sheet, check if the Header is in the DateColumns list, if so then update all the entries in that column from row 2 to max with the date format you want...
...
DateColumns = ['Start_Date', 'End_Date', 'Birthday']
for COL in worksheet.iter_cols(min_row=1,max_row=1):
header = COL[0]
if header.value in DateColumns:
for row in range(2, worksheet.max_row+1):
worksheet.cell(row, COL[0].column).number_format='yyyy-mm-dd;#'

Read excel file and assign each coulmn a variable in MATLAB

I am having a simple problem while reading excel data which contains strings, long string, and numbers. Now I need to make each column (I have 11 here) to define separate variables of 1 column vector so that I can plot in MATLAB against each other or combination.
But the problem is the reading the file and creating 11 column vector. When I assign variable the header also comes.
Code:
%fid = fopen('Data_Link.xlsx');
[num,txt,raw] = xlsread('Data_Link.xlsx');
%fclose(fid);
% Extract data from readData
A = raw(:,1);
B = raw(:,2);
C = raw(:,6);
So I need the variables without header
Data file is truncated and given here.
Can anyone help me?
You can use readtable as ThP suggested. But if you want to use xlsread and you want your data without the header, you just need to remove the first row as in the below example:
%fid = fopen('Data_Link.xlsx');
[num,txt,raw] = xlsread('Data_Link.xlsx');
%fclose(fid);
% Extract data from readData
A = raw(2:end,1);
B = raw(2:end,2);
C = raw(2:end,6);
Note that each array will receive data from row 2 to last row.
You can use readtable instead of xlsread.
Using
T = readtable(‘Data_Link.xlsx’)
will result in a table with a variable for each column. For example T.Year would hold the values from the ‘Year’ column and T.Title would hold the values from the ‘Title’ column, etc.

How to get formatted display cell value in excel using closedXML?

I would like to get the displayed value in excel, not the rich text, but the formatted display value.
For example, if the value is "7/1/2015", and this cell is with number format:cell.Style.NumberFormat.Format="d", then in excel this number will be displayed as 1.
I would like to get the "1" by using closedXML but with no success. Below are some value I tried:
cell.Value = "7/1/2015";
cell.RichText.Text = "7/1/2015";
cell.GetString() = "7/1/2015";
cell.GetFormattedString() = "7/1/2015";
cell.GetValue<string>() = "7/1/2015";
Does any one know how to achieve this?
Many thanks!
Have you tried using NumberFormat.Format?
ex. worksheet.Cell(rowCount, 2).Style.NumberFormat.Format = "mm/dd/yyyy";
Let me know if this is whatyou're looking for.
After some searching, I found this: https://github.com/ClosedXML/ClosedXML/issues/270
which indicates that closedXML formattedstring is different from Excel's and there won't be a fix.
So I ended up adding my own custom handler for date time values.
To get the display value for an Excel cell, i used this below RichText property rather than using the Cell.Value property (which gives the actual value of the cell without formatting).
using cXl = ClosedXML.Excel;
string cellValue, col="A";
int row=1;
cXl.IXLWorksheet ws;
cellValue = ws.Cell(row, col)
.RichText
.ToString();

Read from a specific row onwards from Excel File

I have got a Excel file having around 7000 rows approx to read. And Excel file contains Table of Contents and the actual contents data in details below.
I would like to avoid all rows for Table of Content and start from actual content data to read. This is because if I need to read data for "CPU_INFO" the loop and search string occurrence twice 1] from Table of Content and 2] from actual Content.
So I would like to know if there is any way I can point to Start Row Index to start reading data content for Excel File , thus skipping whole of Table Of Content Section?
As taken from the Apache POI documentation on iterating over rows and cells:
In some cases, when iterating, you need full control over how missing or blank rows or cells are treated, and you need to ensure you visit every cell and not just those defined in the file. (The CellIterator will only return the cells defined in the file, which is largely those with values or stylings, but it depends on Excel).
In cases such as these, you should fetch the first and last column information for a row, then call getCell(int, MissingCellPolicy) to fetch the cell. Use a MissingCellPolicy to control how blank or null cells are handled.
If we take the example code from that documentation, and tweak it for your requirement to start on row 7000, and assuming you want to not go past 15k rows, we get:
// Decide which rows to process
int rowStart = Math.min(7000, sheet.getFirstRowNum());
int rowEnd = Math.max(1500, sheet.getLastRowNum());
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c == null) {
// The spreadsheet is empty in this cell
} else {
// Do something useful with the cell's contents
}
}
}

Matlab number of rows in excel file

is there a command of Matlab to get the number of the written rows in excel file?
firstly, I fill the first row. and then I want to add another rows in the excel file.
so this is my excel file:
I tried:
e = actxserver ('Excel.Application');
filename = fullfile(pwd,'example2.xlsx');
ewb = e.Workbooks.Open(filename);
esh = ewb.ActiveSheet;
sheetObj = e.Worksheets.get('Item', 'Sheet1');
num_rows = sheetObj.Range('A1').End('xlDown').Row
But num_rows = 1048576, instead of 1.
please help, thank you!
If the file is empty, or contains data in only one row, then .End('xlDown').Row; will move to the very bottom of the sheet (1048576 is the number of rows in a Excel 2007+ sheet).
Test if cell A2 is empty first, and return 0 if it is.
Or use Up from the bottom of the sheet
num_rows = sheetObj.Cells(sheetObj.Rows.Count, 1).End('xlUp').Row
Note: I'm not sure of the Matlab syntax, so this may need some adjusting
You can use MATLAB's xlsread function to read in the spreadsheet. This obtains the following fields:
[numbers strings misc] = xlsread('myfile.xlsx');
if you do a size check on strings or misc, this should give you the following:
[rows columns] = size(strings);
testing this, I got rows = 1, columns = 10 (assuming nothing else was beyond 'A' in the spreadsheet).

Resources