I have an Excel VSTO Add-in, and I have to get excel table data into a DataTable class, I've already implemented it by looping through the cells but its a bit slow:
DataTable dataTable = new DataTable(table.Name);
foreach (Excel.ListColumn column in table.ListColumns)
dataTable.Columns.Add(column.Name);
for (int r = 2; r < table.ListRows.Count + 2; r++)
{
DataRow row = dataTable.NewRow();
for (int c = 1; c < table.ListColumns.Count + 1; c++)
row[c - 1] = ((Excel.Range)table.Range[r, c]).Value;
dataTable.Rows.Add(row);
}
So my question is, is there any Method (or fast way) in Excel interop to export table data to a DataTable class?
Thanks
You can read all values in the DataBodyRange of your table (a ListObject I deduce) at once, into a 1-based 2D object array, like so:
var theValues = table.DataBodyRange.Value;
It's then just a matter of scanning this array.
Related
I'm trying to find an equivalent to the Excel CountA function for a DataTable.
'This code works for searching through a range of columns in Excel
If xlApp.WorksheetFunction.CountA(WS.Range("A" & i & ":G" & i)) > 0 Then
DataExists = True
End If
'This is the code I need help with for searching though a DataTable
If DataTbl.Rows(i).Item(0:6).ToString <> "" Then
DataExists = True
End If
Hoping someone can help with this.
I think you simply need a for-each loop.
internal static int CountForEach(this DataTable? dt)
{
if (dt == null)
return 0;
int count = 0;
foreach (DataRow row in dt.Rows)
foreach (object? o in row.ItemArray)
if (o != DBNull.Value)
count++;
return count;
}
Usage:
DataTable dt = GetYourDataTable();
int countValues = dt.CountNotNullValues_ForEach();
This is also doable with LINQ but I think it would be slower -- I'll run some benchmarks later and update my answer.
EDIT
I added these two LINQ methods:
internal static int CountLinqList(this DataTable? dt)
{
int count = 0;
dt?.Rows.Cast<DataRow>().ToList().ForEach(row => count += row.ItemArray.Where(g => g != DBNull.Value).Count());
return count;
}
internal static int CountLinqParallel(this DataTable? dt)
{
ConcurrentBag<int> ints = new();
dt?.AsEnumerable().AsParallel().ForAll(row => ints.Add(row.ItemArray.Where(g => g != DBNull.Value).Count()));
int count = ints.Sum();
return count;
}
These are the statistics obtained with BenchmarkDotNet:
I used a pseudo-randomly generated datatable of around 5.5 million rows and three columns as test.
I think these results may change with larger datatables, but for smaller (around 500k rows and less) the fastest method will probably be the simple for-each loop.
Fastest methods:
For each loop
Linq parallel
Linq list > for each
I'm surely not a LINQ-guru but I'd like to be, so if someone has a better LINQ implementation please let me know.
By the way, I don’t think this could be the typical LINQ use case.
I want to populate column B value on basis of value selected in Drop down of column A. Working fine, with the help of VLOOKUP formula in all cells of column B. But now I want to hide the formula for column B cells, in order to avoid the formula alteration by user(even by mistake).
But formula should work as expected even after hiding it.
Is there any way to achieve it using Apache POI. Or Is there any other way to achieve auto population on basis of selection in drop down using Apache POI.
Thankyou in advance.
Hiding formulas is part of the cell style in Excel. So the simplest answer would be to use CellStyle.setHidden(true).
But that will only hide the formula but not prevent the formula alteration by user. This is what sheet protection is for. So you would need a combination of the both.
Following complete example shows that. Fomulas in C2:C4 are hidden and protected.
import java.io.FileOutputStream;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.ss.usermodel.*;
public class CreateExcelDefaultColumnStyleNotLockedAndLockedHideFormulas {
public static void main(String[] args) throws Exception {
Workbook workbook = new XSSFWorkbook(); String filePath = "./CreateExcelDefaultColumnStyleNotLockedAndLockedHideFormulas.xlsx";
//Workbook workbook = new HSSFWorkbook(); String filePath = "./CreateExcelDefaultColumnStyleNotLockedAndLockedHideFormulas.xls";
CellStyle lockedHideFormulas = workbook.createCellStyle();
lockedHideFormulas.setLocked(true);
lockedHideFormulas.setHidden(true);
CellStyle notLocked = workbook.createCellStyle();
notLocked.setLocked(false);
Sheet sheet = workbook.createSheet();
Row row = sheet.createRow(0);
Cell cell = null;
for (int c = 0; c < 3; c++) {
cell = row.createCell(c);
cell.setCellValue("Col " + (c+1));
}
for (int r = 1; r < 4; r++) {
row = sheet.createRow(r);
for (int c = 0; c < 2; c++) {
cell = row.createCell(c);
cell.setCellValue(r * (c+1));
cell.setCellStyle(notLocked);
}
cell = row.createCell(2);
cell.setCellFormula("A" + (r+1) + "*B" + (r+1));
cell.setCellStyle(lockedHideFormulas);
}
sheet.setDefaultColumnStyle(0, notLocked);
sheet.setDefaultColumnStyle(1, notLocked);
sheet.setDefaultColumnStyle(2, notLocked);
sheet.protectSheet("");
FileOutputStream out = new FileOutputStream(filePath);
workbook.write(out);
out.close();
workbook.close();
}
}
I have a dataGrid with more than 100 rows in it. I am extracting it to an existing Excel file. I can open the file and add values to the sheet. My problem is, the value becomes null as soon as it gets to the 14th row.
I tried reversing the order of the data in the datagrid just to be sure that it's not the value or data in the dataGrid that is causing the issue but I still get the same result. Only the first 13 rows are extracted to the Excel sheet. The for loop still goes to the rest of the loop but it seems to not get the values.
Here is my code:
var path = #"D:\Reports\Sample.xlsx";
var excel = new Excel.Application {Visible = true};
var wb = excel.Workbooks.Open(path);
var ws = (Excel.Worksheet)wb.Sheets["summary"];
for (var i = 0; i < Grid.Columns.Count; i++)
{
for (int j = 0; j < Grid.Items.Count; j++)
{
var b = Grid.Columns[i].GetCellContent(Grid.Items[j]) as TextBlock; =====> ON THE 14th ROW, the "b" variable becomes null all throught out the for-loop
var myRange = (Range)ws.Cells[j + 2, i + 1];
try
{
if (b != null) myRange.Value2 = b.Text;
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Thread.Sleep(100);
}
}
This is the Excel file created. Values are not extracted any more after the 14th row.
I have data table like below. Here row 2,3,4,5 are not allowed. how do i validate this scenario using C#. Please help. Here the number of columns is not fixed.
A B C
A B C
A B
A C
B C
A B B
B B C
A C B
C# with LinQ
This function eliminates duplicates and null values from columns. If is not null what you are looking for, then you just need to change "DBNull" for what you want.
public static DataTable FilterDataTable(DataTable table)
{
// Erase duplicates
DataView dv = new DataView(table);
table = dv.ToTable(true, table.Columns.Cast<DataColumn>().Select(x => x.ColumnName).ToArray());
// Get Null values
List<DataRow> toErase = new List<DataRow>();
foreach (DataRow item in table.Rows)
for (int i = 0; i < item.ItemArray.Length; i++)
{
if (item.ItemArray[i].GetType().Name == "DBNull")
{ toErase.Add(item); break; }
}
//Erase Null Values
foreach (DataRow item in toErase)
table.Rows.Remove(item);
return table;
}
I am trying to delete all the hidden columns from Excel sheet and i am using Microsoft Office Interop.
Problem is when i iterate through columns i didn't find the column hidden property set.
I don't know what am I doing wrong here.
Any help would be appreciated
int columnCount = wsCurrent.UsedRange.Columns.Count;
Excel.Worksheet wsCurrent = (Excel.Worksheet)wsEnumerator.Current;
for (int c = 1; c <= columnCount; c++)
{
if (wsCurrent.UsedRange.get_Range((Excel.Range)wsCurrent.UsedRange.Cells[1, c], (Excel.Range)wsCurrent.UsedRange.Cells[wsCurrent.Rows.Count, c]).EntireColumn.Hidden)
Console.WriteLine("Column Hidden");
}
By the way your code does not have "r" variable declaration which you are using.
Use next code to iterate cells and determine if cell is hidden:
const double SIZE = 0.5;
int cols = sheet.UsedRange.Columns.Count;
int rows = sheet.UsedRange.Rows.Count;
Range usedRange = sheet.UsedRange;
for (int iCol = 1; iCol <= cols; iCol++)
{
for (int jRow = 1; jRow <= rows; jRow++)
{
Range cellRng = (Excel.Range)usedRange.Cells[jRow, iCol];
if (double.Parse(cellRng.ColumnWidth.ToString()) <= SIZE ||
double.Parse(cellRng.RowHeight.ToString()) <= SIZE)
{
/*do your stuff here*/
}
}
}
Like this way. Constant Size is needed becouse cell could be almost hidden, but visualy it will be hidden. Code must be optimized, but concept is clear.