Looping through fields in TextFieldParser in c# - c#-4.0

I am using TextFielParse to parse through a CSV file with 10 columns and 100 rows.
I am interested in first and 3rd column of each row.
How do I skip once the 3rd column is processed and move to the next row to process?
Currently it is going through all the 10 columns
using (TextFieldParser parser = new TextFieldParser(#"c:\20140513_134709.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
parser.ReadLine();
int regSeqNo = 0;
bool isNumerical = false;
string mailDate = string.Empty;
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
isNumerical = int.TryParse(fields[0].ToString(), out regSeqNo);
mailDate = fields[2].ToString();
continue;
}
}
}

You can use following code to get the required
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
isNumerical = int.TryParse(fields[0].ToString(), out regSeqNo);
mailDate = fields[2].ToString();
}
Inner loop is not required :)

Related

Datagrid gives null values using for loop in WPF

I have a dataGrid with more than 100 rows in it. I am extracting it to an existing Excel file. I can open the file and add values to the sheet. My problem is, the value becomes null as soon as it gets to the 14th row.
I tried reversing the order of the data in the datagrid just to be sure that it's not the value or data in the dataGrid that is causing the issue but I still get the same result. Only the first 13 rows are extracted to the Excel sheet. The for loop still goes to the rest of the loop but it seems to not get the values.
Here is my code:
var path = #"D:\Reports\Sample.xlsx";
var excel = new Excel.Application {Visible = true};
var wb = excel.Workbooks.Open(path);
var ws = (Excel.Worksheet)wb.Sheets["summary"];
for (var i = 0; i < Grid.Columns.Count; i++)
{
for (int j = 0; j < Grid.Items.Count; j++)
{
var b = Grid.Columns[i].GetCellContent(Grid.Items[j]) as TextBlock; =====> ON THE 14th ROW, the "b" variable becomes null all throught out the for-loop
var myRange = (Range)ws.Cells[j + 2, i + 1];
try
{
if (b != null) myRange.Value2 = b.Text;
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Thread.Sleep(100);
}
}
This is the Excel file created. Values are not extracted any more after the 14th row.

I want to write set of random numbers in an Excel sheet by iterating rows .I am able write random numbers and I am unable to store them in excel

I want to generate random numbers and parallel i want to write them in excel sheet in 1st column upto 15 rows. I tried with below script, but, it's not working.
I tried with below script, every time I want to store up to 15 numbers and those numbers should not be repeated.
I tried by using below methods, but I am not getting how can we iterate both of them. Random numbers and Excel cell.
public long Random_Number() {
Random rand = new Random();
long drand = (long)(rand.nextDouble()*1000000000L);
long correct = 0;
String numberString = Long.toString(drand);
if (numberString.length() == 8) {
System.out.println(drand+ "It's not a 9 digit");
}
else if (numberString.length() == 9) {
correct = drand;
System.out.println(correct);
}
return correct;
}
public void writeData_Int_SSN( int cellNum) {
try {
File src = new File("filename.xls");
Cell cell = null;
FileInputStream fis = new FileInputStream(src);
HSSFWorkbook wb = new HSSFWorkbook(fis);
HSSFSheet sh1 = wb.getSheetAt(0);
long gid = Random_Number();
String GroupID = Long.toString(gid);
System.out.println(GroupID);
int num = Integer.parseInt(GroupID);
for (int i = 1; i < 12; i++) {
System.out.println("Entering into excel sheet");
cell = sh1.getRow(i).getCell(cellNum);
System.out.println("Iterating cells");
if (cell.getCellType() == Cell.CELL_TYPE_STRING) {
System.out.println("We are entering numeric data");
int str1 = Integer.parseInt(cell.getStringCellValue());
System.out.println(str1);
cell.setCellValue(num);
}
}
FileOutputStream fout = new FileOutputStream(new File("filename.xls"));
wb.write(fout);
fout.close();
}
catch (Exception e) {
System.out.println(e.getMessage());
}
}

Apache POI not getting all rows and columns from a spreadsheet

I am creating a test framework using selenium, testNG, and Apache POI. So far I have created a piece of code that will get each row from my spreadsheet and insert it into the test. However, on adding a new row it simply ignores the added row. Any help is greatly appreciated.
Code for getting inputs:
#DataProvider(name="ExcelData")
public Object[][] passData() {
ExcelDataConfig config = new ExcelDataConfig("C:\\Users\\dindo\\Documents\\tests\\d2c-lv-int-01_DATA.xlsx");
int rows = config.getRowCount(0);
int frows = rows-2;
Object[][] data = new Object[frows][11];
for(int i=2;i<rows;i++)
{
data[i-2][0]=config.getStrData(0, i, 0);
data[i-2][1]=config.getIntData(0, i, 1);
data[i-2][2]=config.getIntData(0, i, 2);
data[i-2][3]=config.getStrData(0, i, 3);
data[i-2][4]=config.getStrData(0, i, 4);
data[i-2][5]=config.getStrData(0, i, 5);
data[i-2][6]=config.getIntData(0, i, 6);
data[i-2][7]=config.getIntData(0, i, 7);
data[i-2][8]=config.getIntData(0, i, 8);
data[i-2][9]=config.getStrData(0, i, 9);
data[i-2][10]=config.getStrData(0, i, 10);
}
return data;
}
Code for Excel sheet:
public ExcelDataConfig(String excelPath) {
try {
File src = new File(excelPath);
FileInputStream fis = new FileInputStream(src);
wb = new XSSFWorkbook(fis);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
public String getStrData(int sheetNumber,int row, int col) {
Details = wb.getSheetAt(sheetNumber);
String strdata = Details.getRow(row).getCell(col).getStringCellValue();
return strdata;
}
public String getIntData(int sheetNumber,int row, int col) throws NullPointerException{
Details = wb.getSheetAt(sheetNumber);
double doubledata = Details.getRow(row).getCell(col).getNumericCellValue();
int intdata = (int) doubledata;
String strdata = Integer.toString(intdata);
return strdata;
}
public int getRowCount(int sheetIndex)
{
int row = wb.getSheetAt(sheetIndex).getLastRowNum();
return row;
}
Also, is it possible to merge getIntData and getStrData into one function which automatically knows if it is a string or integer.
Many Thanks in advance.
After some extensive keyboard thrashing I finally worked out the chink. It was simply that as Apache POI gets the first row in an xlsx file as 0, there was 1 to less rows in the array. Stupidly enough I found this out by accidentally saving my sheet with an extra letter in the row below the last.😂

Foreach loop per item fill document

I have got a list and I want to go through each item condition and insert it into a document. The issue is that I don’t know how to find an item and go through each condition till the item changes then start again.
Item Condition Total
Bag New 3
Bag Old 5
Jacket New 2
Racket New 1
Racket old 3
Racket unknown 8
This is what I do:
foreach (DataGridViewRow row in tracker.dataGridView1.Rows)
{
if (row.Cells[0].Value != null)
{
string template = #"C:\ document.pdf";
string newFile = #"c:\"+ row.Cells[0].Value.ToString() +".pdf";
PdfReader pdfReader = new PdfReader(pdfTemplate);
PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(newFile, FileMode.Create));
AcroFields pdfFormFields = pdfStamper.AcroFields;
******************************************************************************
// set values for fields
if (row.Cells[1].Value.ToString() == "New")
{
pdfFormFields.SetField("Condition", row.Cells[2].Value.ToString());
}
else if (row.Cells[1].Value.ToString() == "Old")
{
pdfFormFields.SetField("Condition2", row.Cells[2].Value.ToString());
}
***************************************************************************************
pdfStamper.FormFlattening = true;
pdfStamper.Close();
}
else
break;
}
I want the code between the stars to fill the same document till row.cells[0].value changes then start the foreach again.
Just store the value of row.cells[0].value in a variable and compare it during each iteration of the foreach loop.
You also have to distinguish the first row from the rest, since only then a comparison of two values makes sense.
Example code:
string previousValue = null;
string currentValue = null;
bool firstRow = true;
foreach (DataGridViewRow row in tracker.dataGridView1.Rows)
{
currentValue = row.Cells[0].Value.ToString();
if (currentValue != null)
{
string template = #"C:\ document.pdf";
string newFile = #"c:\"+ currentValue +".pdf";
PdfReader pdfReader = new PdfReader(pdfTemplate);
PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(newFile, FileMode.Create));
AcroFields pdfFormFields = pdfStamper.AcroFields;
// Compare previous with current value
if (!firstRow && previousValue != currentValue)
{
// Difference, start new document
....
}
else
{
// Same Value
}
// When row is handled, update previous with current value, and set firstRow to false
previousValue = currentValue;
firstRow = false;
...

reading Excel Open XML is ignoring blank cells

I am using the accepted solution here to convert an excel sheet into a datatable. This works fine if I have "perfect" data but if I have a blank cell in the middle of my data it seems to put the wrong data in each column.
I think this is because in the below code:
row.Descendants<Cell>().Count()
is number of populated cells (not all columns) AND:
GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
seems to find the next populated cell (not necessarily what is in that index) so if the first column is empty and i call ElementAt(0), it returns the value in the second column.
Here is the full parsing code.
DataRow tempRow = dt.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
{
Console.Write(tempRow[i].ToString());
}
}
This makes sense since Excel will not store a value for a cell that is null. If you open your file using the Open XML SDK 2.0 Productivity Tool and traverse the XML down to the cell level you will see that only the cells that have data are going to be in that file.
Your options are to insert blank data in the range of cells you are going to traverse or programmatically figure out a cell was skipped and adjust your index appropriately.
I made an example excel document with a string in cell reference A1 and C1. I then opened up the excel document in the Open XML Productivity Tool and here is the XML that was stored:
<x:row r="1" spans="1:3"
xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:c r="A1" t="s">
<x:v>0</x:v>
</x:c>
<x:c r="C1" t="s">
<x:v>1</x:v>
</x:c>
</x:row>
Here you will see that the data corresponds to the first row and that only two cells worth of data are saved for that row. The data saved corresponds to A1 and C1 and that no cells with null values are saved.
To get the functionality that you need, you can traverse over the Cells as you are doing above, but you will need to check what the value the Cell is referencing and determine if any Cells have been skipped. to do that you will need two utility functions to get the Column Name from the cell reference and to then translate that column name into a zero based index:
private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };
/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
// Create a regular expression to match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
}
/// <summary>
/// Given just the column name (no row index), it will return the zero based column index.
/// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ).
/// A length of three can be implemented when needed.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful; otherwise null</returns>
public static int? GetColumnIndexFromName(string columnName)
{
int? columnIndex = null;
string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();
if (colLetters.Count() <= 2)
{
int index = 0;
foreach (string col in colLetters)
{
List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList();
int? indexValue = Letters.IndexOf(col1.ElementAt(index));
if (indexValue != -1)
{
// The first letter of a two digit column needs some extra calculations
if (index == 0 && colLetters.Count() == 2)
{
columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
}
else
{
columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
}
}
index++;
}
}
return columnIndex;
}
Then you can iterate over the Cells and check to see what the cell reference is compared to the columnIndex. If it is less than then you add blank data to your tempRow, otherwise just read in the value contained in the cell. (Note: I did not test the code below, but the general idea should help):
DataRow tempRow = dt.NewRow();
int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
// Gets the column index of the cell with data
int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));
if (columnIndex < cellColumnIndex)
{
do
{
tempRow[columnIndex] = //Insert blank data here;
columnIndex++;
}
while(columnIndex < cellColumnIndex);
}
tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);
if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
{
Console.Write(tempRow[i].ToString());
}
columnIndex++;
}
Here's a slightly modified version of Waylon's answer which also relied on other answers. It encapsulates his method in a class.
I changed
IEnumerator<Cell> GetEnumerator()
to
IEnumerable<Cell> GetRowCells(Row row)
Here's the class, you don't need to instantiate it, it just serves as an utility class:
public class SpreedsheetHelper
{
///<summary>returns an empty cell when a blank cell is encountered
///</summary>
public static IEnumerable<Cell> GetRowCells(Row row)
{
int currentCount = 0;
foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
{
string columnName = GetColumnName(cell.CellReference);
int currentColumnIndex = ConvertColumnNameToNumber(columnName);
for (; currentCount < currentColumnIndex; currentCount++)
{
yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
}
yield return cell;
currentCount++;
}
}
/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
// Match the column name portion of the cell name.
var regex = new System.Text.RegularExpressions.Regex("[A-Za-z]+");
var match = regex.Match(cellReference);
return match.Value;
}
/// <summary>
/// Given just the column name (no row index),
/// it will return the zero based column index.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful</returns>
/// <exception cref="ArgumentException">thrown if the given string
/// contains characters other than uppercase letters</exception>
public static int ConvertColumnNameToNumber(string columnName)
{
var alpha = new System.Text.RegularExpressions.Regex("^[A-Z]+$");
if (!alpha.IsMatch(columnName)) throw new ArgumentException();
char[] colLetters = columnName.ToCharArray();
Array.Reverse(colLetters);
int convertedValue = 0;
for (int i = 0; i < colLetters.Length; i++)
{
char letter = colLetters[i];
int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
convertedValue += current * (int)Math.Pow(26, i);
}
return convertedValue;
}
}
Now you're able to get all rows' cells in this way:
// skip the part that retrieves the worksheet sheetData
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach(Row row in rows)
{
IEnumerable<Cell> cells = SpreedsheetHelper.GetRowCells(row);
foreach (Cell cell in cells)
{
// skip part that reads the text according to the cell-type
}
}
It will contain all cells even if they are empty.
Here's an implementation of IEnumerable that should do what you want, compiled and unit tested.
///<summary>returns an empty cell when a blank cell is encountered
///</summary>
public IEnumerator<Cell> GetEnumerator()
{
int currentCount = 0;
// row is a class level variable representing the current
// DocumentFormat.OpenXml.Spreadsheet.Row
foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
{
string columnName = GetColumnName(cell.CellReference);
int currentColumnIndex = ConvertColumnNameToNumber(columnName);
for ( ; currentCount < currentColumnIndex; currentCount++)
{
yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
}
yield return cell;
currentCount++;
}
}
Here are the functions it relies on:
/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
// Match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
}
/// <summary>
/// Given just the column name (no row index),
/// it will return the zero based column index.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful</returns>
/// <exception cref="ArgumentException">thrown if the given string
/// contains characters other than uppercase letters</exception>
public static int ConvertColumnNameToNumber(string columnName)
{
Regex alpha = new Regex("^[A-Z]+$");
if (!alpha.IsMatch(columnName)) throw new ArgumentException();
char[] colLetters = columnName.ToCharArray();
Array.Reverse(colLetters);
int convertedValue = 0;
for (int i = 0; i < colLetters.Length; i++)
{
char letter = colLetters[i];
int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
convertedValue += current * (int)Math.Pow(26, i);
}
return convertedValue;
}
Throw it in a class and give it a try.
See my implementation:
Row[] rows = worksheet.GetFirstChild<SheetData>()
.Elements<Row>()
.ToArray();
string[] columnNames = rows.First()
.Elements<Cell>()
.Select(cell => GetCellValue(cell, document))
.ToArray();
HeaderLetters = ExcelHeaderHelper.GetHeaderLetters((uint)columnNames.Count());
if (columnNames.Count() != HeaderLetters.Count())
{
throw new ArgumentException("HeaderLetters");
}
IEnumerable<List<string>> cellValues = GetCellValues(rows.Skip(1), columnNames.Count(), document);
//Here you can enumerate through the cell values, based on the cell index the column names can be retrieved.
HeaderLetters are collected using this class:
private static class ExcelHeaderHelper
{
public static string[] GetHeaderLetters(uint max)
{
var result = new List<string>();
int i = 0;
var columnPrefix = new Queue<string>();
string prefix = null;
int prevRoundNo = 0;
uint maxPrefix = max / 26;
while (i < max)
{
int roundNo = i / 26;
if (prevRoundNo < roundNo)
{
prefix = columnPrefix.Dequeue();
prevRoundNo = roundNo;
}
string item = prefix + ((char)(65 + (i % 26))).ToString(CultureInfo.InvariantCulture);
if (i <= maxPrefix)
{
columnPrefix.Enqueue(item);
}
result.Add(item);
i++;
}
return result.ToArray();
}
}
And the helper methods are:
private static IEnumerable<List<string>> GetCellValues(IEnumerable<Row> rows, int columnCount, SpreadsheetDocument document)
{
var result = new List<List<string>>();
foreach (var row in rows)
{
List<string> cellValues = new List<string>();
var actualCells = row.Elements<Cell>().ToArray();
int j = 0;
for (int i = 0; i < columnCount; i++)
{
if (actualCells.Count() <= j || !actualCells[j].CellReference.ToString().StartsWith(HeaderLetters[i]))
{
cellValues.Add(null);
}
else
{
cellValues.Add(GetCellValue(actualCells[j], document));
j++;
}
}
result.Add(cellValues);
}
return result;
}
private static string GetCellValue(Cell cell, SpreadsheetDocument document)
{
bool sstIndexedcell = GetCellType(cell);
return sstIndexedcell
? GetSharedStringItemById(document.WorkbookPart, Convert.ToInt32(cell.InnerText))
: cell.InnerText;
}
private static bool GetCellType(Cell cell)
{
return cell.DataType != null && cell.DataType == CellValues.SharedString;
}
private static string GetSharedStringItemById(WorkbookPart workbookPart, int id)
{
return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id).InnerText;
}
The solution deals with shared cell items (SST indexed cells).
All good examples. Here is the one I am using since I need to keep track of all rows, cells, values, and titles for correlation and analysis.
The method ReadSpreadsheet opens an xlxs file and goes through each worksheet, row, and column. Since the values are stored in a referenced string table, I also explicitly use that per worksheet. There are other classes used: DSFunction and StaticVariables. The latter holds oft used parameter values, such as the referenced 'quotdouble' ( quotdouble = "\u0022"; ) and 'crlf' (crlf = "\u000D" + "\u000A"; ).
The relevant DSFunction method GetIntColIndexForLetter is included below. It returns an integer value for the column index corresponding to letter names such as (A,B, AA, ADE, etc.). This is used along with the parameter 'ncellcolref' to determine if any columns have been skipped and to enter empty string values for each one that is missing.
I also do some cleaning of the values before storing temporarily in a List object (using Replace method).
Subsequently, I use the hash table (Dictionary) of column names to extract values across different worksheets, correlate them, create normalized values, and then create an object used in our product which is then stored as an XML file. None of this is shown but is why this approach is used.
public static class DSFunction {
/// <summary>
/// Creates an integer value for a column letter name starting at 1 for 'a'
/// </summary>
/// <param name="lettstr">Column name as letters</param>
/// <returns>int value</returns>
public static int GetIntColIndexForLetter(string lettstr) {
string txt = "", txt1="";
int n1, result = 0, nbeg=-1, nitem=0;
try {
nbeg = (int)("a".ToCharArray()[0]) - 1; //1 based
txt = lettstr;
if (txt != "") txt = txt.ToLower().Trim();
while (txt != "") {
if (txt.Length > 1) {
txt1 = txt.Substring(0, 1);
txt = txt.Substring(1);
}
else {
txt1 = txt;
txt = "";
}
if (!DSFunction.IsNumberString(txt1, "real")) {
nitem++;
n1 = (int)(txt1.ToCharArray()[0]) - nbeg;
result += n1 + (nitem - 1) * 26;
}
else {
break;
}
}
}
catch (Exception ex) {
txt = ex.Message;
}
return result;
}
}
public static class Extractor {
public static string ReadSpreadsheet(string fileUri) {
string msg = "", txt = "", txt1 = "";
int i, n1, n2, nrow = -1, ncell = -1, ncellcolref = -1;
Boolean haveheader = true;
Dictionary<string, int> hashcolnames = new Dictionary<string, int>();
List<string> colvalues = new List<string>();
try {
if (!File.Exists(fileUri)) { throw new Exception("file does not exist"); }
using (SpreadsheetDocument ssdoc = SpreadsheetDocument.Open(fileUri, true)) {
var stringTable = ssdoc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
foreach (Sheet sht in ssdoc.WorkbookPart.Workbook.Descendants<Sheet>()) {
nrow = 0;
foreach (Row ssrow in ((WorksheetPart)(ssdoc.WorkbookPart.GetPartById(sht.Id))).Worksheet.Descendants<Row>()) {
ncell = 0;
ncellcolref = 0;
nrow++;
colvalues.Clear();
foreach (Cell sscell in ssrow.Elements<Cell>()) {
ncell++;
n1 = DSFunction.GetIntColIndexForLetter(sscell.CellReference);
for (i = 0; i < (n1 - ncellcolref - 1); i++) {
if (nrow == 1 && haveheader) {
txt1 = "-missing" + (ncellcolref + 1 + i).ToString() + "-";
if (!hashcolnames.TryGetValue(txt1, out n2)) {
hashcolnames.Add(txt1, ncell - 1);
}
}
else {
colvalues.Add("");
}
}
ncellcolref = n1;
if (sscell.DataType != null) {
if (sscell.DataType.Value == CellValues.SharedString && stringTable != null) {
txt = stringTable.SharedStringTable.ElementAt(int.Parse(sscell.InnerText)).InnerText;
}
else if (sscell.DataType.Value == CellValues.String) {
txt = sscell.InnerText;
}
else txt = sscell.InnerText.ToString();
}
else txt = sscell.InnerText;
if (txt != "") txt1 = txt.ToLower().Trim(); else txt1 = "";
if (nrow == 1 && haveheader) {
txt1 = txt1.Replace(" ", "");
if (txt1 == "table/viewname") txt1 = "tablename";
else if (txt1 == "schemaownername") txt1 = "schemaowner";
else if (txt1 == "subjectareaname") txt1 = "subjectarea";
else if (txt1.StartsWith("column")) {
txt1 = txt1.Substring("column".Length);
}
if (!hashcolnames.TryGetValue(txt1, out n1)) {
hashcolnames.Add(txt1, ncell - 1);
}
}
else {
txt = txt.Replace(((char)8220).ToString(), "'"); //special "
txt = txt.Replace(((char)8221).ToString(), "'"); //special "
txt = txt.Replace(StaticVariables.quotdouble, "'");
txt = txt.Replace(StaticVariables.crlf, " ");
txt = txt.Replace(" ", " ");
txt = txt.Replace("<", "");
txt = txt.Replace(">", "");
colvalues.Add(txt);
}
}
}
}
}
}
catch (Exception ex) {
msg = "notok:" + ex.Message;
}
return msg;
}
}
The letter code is a base 26 encoding so this should work to convert it into an offset.
// Converts letter code (i.e. AA) to an offset
public int offset( string code)
{
var offset = 0;
var byte_array = Encoding.ASCII.GetBytes( code ).Reverse().ToArray();
for( var i = 0; i < byte_array.Length; i++ )
{
offset += (byte_array[i] - 65 + 1) * Convert.ToInt32(Math.Pow(26.0, Convert.ToDouble(i)));
}
return offset - 1;
}
You can use this function to extract a cell from a row passing the header index:
public static Cell GetCellFromRow(Row r ,int headerIdx) {
string cellname = GetNthColumnName(headerIdx) + r.RowIndex.ToString();
IEnumerable<Cell> cells = r.Elements<Cell>().Where(x=> x.CellReference == cellname);
if (cells.Count() > 0)
{
return cells.First();
}
else {
return null;
}
}
public static string GetNthColumnName(int n)
{
string name = "";
while (n > 0)
{
n--;
name = (char)('A' + n % 26) + name;
n /= 26;
}
return name;
}
Okay, I'm not exactly an expert on this but the other answers do seem like over kill to me so here's my solution:
// Loop through each row in the spreadsheet, skipping the header row
foreach (var row in sheetData.Elements<Row>().Skip(1))
{
var i = 0;
string[] letters = new string[15] {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O" };
List<String> cellsList = new List<string>();
foreach (var cell in row.Elements<Cell>().ToArray())
{
while (cell.CellReference.ToString()[0] != Convert.ToChar(letters[i]))
{//accounts for multiple consecutive blank cells
cellsList.Add("");
i++;
}
cellsList.Add(cell.CellValue.Text);
i++;
}
string[] cells = cellsList.ToArray();
foreach(var cell in cellsList)
{
//display contents of cell, depending on the datatype you may need to call each of the cells manually
}
}
Hope someone finds this useful!
With apologies for posting yet another answer to this question, here's the code I used.
I was having problems with OpenXML not working properly if a worksheet had a blank row at the top. It would sometimes just return a DataTable with 0 rows and 0 columns in it. The code below copes with this, and all other worksheets.
Here's how you would call my code. Just pass in a filename and the name of the Worksheet to read in:
DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");
And here's the code itself:
public class OpenXMLHelper
{
// A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one
// of the worksheets.
//
// We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers,
// OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more
// stable method of reading in the data.
//
public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
{
DataTable dt = new DataTable(worksheetName);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
{
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
if (theSheet == null)
throw new Exception("Couldn't find the worksheet: " + worksheetName);
// Retrieve a reference to the worksheet part.
WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
Worksheet workSheet = wsPart.Worksheet;
string dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4"
int numOfColumns = 0;
int numOfRows = 0;
CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
string[,] cellValues = new string[numOfColumns, numOfRows];
int colInx = 0;
int rowInx = 0;
string value = "";
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
// Iterate through each row of OpenXML data, and store each cell's value in the appropriate slot in our [,] string array.
foreach (Row row in rows)
{
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
// *DON'T* assume there's going to be one XML element for each column in each row...
Cell cell = row.Descendants<Cell>().ElementAt(i);
if (cell.CellValue == null || cell.CellReference == null)
continue; // eg when an Excel cell contains a blank string
// Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based)
rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1; // Needs to be 0-based
// Fetch the value in this cell
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
cellValues[colInx, rowInx] = value;
}
}
// Copy the array of strings into a DataTable.
// We don't (currently) make any attempt to work out which columns should be numeric, rather than string.
for (int col = 0; col < numOfColumns; col++)
dt.Columns.Add("Column_" + col.ToString());
for (int row = 0; row < numOfRows; row++)
{
DataRow dataRow = dt.NewRow();
for (int col = 0; col < numOfColumns; col++)
{
dataRow.SetField(col, cellValues[col, row]);
}
dt.Rows.Add(dataRow);
}
#if DEBUG
// Write out the contents of our DataTable to the Output window (for debugging)
string str = "";
for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
{
for (colInx = 0; colInx < maxNumOfColumns; colInx++)
{
object val = dt.Rows[rowInx].ItemArray[colInx];
str += (val == null) ? "" : val.ToString();
str += "\t";
}
str += "\n";
}
System.Diagnostics.Trace.WriteLine(str);
#endif
return dt;
}
}
private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
{
// How many columns & rows of data does this Worksheet contain ?
// We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
// eg "B1:F4" -> we'll need 6 columns and 4 rows.
//
// (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
try
{
string[] parts = dimensions.Split(':'); // eg "B1:F4"
if (parts.Length != 2)
throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns
numOfRows = GetRowIndexFromCellAddress(parts[1]);
}
catch
{
throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
}
}
public static int GetRowIndexFromCellAddress(string cellAddress)
{
// Convert an Excel CellReference column into a 1-based row index
// eg "D42" -> 42
// "F123" -> 123
string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
return int.Parse(rowNumber);
}
public static int GetColumnIndexByName(string cellAddress)
{
// Convert an Excel CellReference column into a 0-based column index
// eg "D42" -> 3
// "F123" -> 5
var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
int number = 0, pow = 1;
for (int i = columnName.Length - 1; i >= 0; i--)
{
number += (columnName[i] - 'A' + 1) * pow;
pow *= 26;
}
return number - 1;
}
}
Using ClosedXML.Excel Instead of OpenXML:
public DataTable ImportTable(DataTable dt, string FileName)
{
Statics.currentProgressValue = 0;
Statics.maxProgressValue = 100;
Statics.cancelProgress = false;
try
{
bool fileExist = File.Exists(FileName);
if (fileExist)
{
using (XLWorkbook workBook = new XLWorkbook(FileName))
{
IXLWorksheet workSheet = workBook.Worksheet(1);
var rowCount = workSheet.RangeUsed().RowCount();
if (rowCount > 0)
{
var colCount = workSheet.Row(1).CellsUsed().Count();
if (dt.Columns.Count < colCount)
throw new Exception($"Expects at least {dt.Columns.Count} columns.");
//Loop through the Worksheet rows.
Statics.maxProgressValue = rowCount;
for (int i = 1; i < rowCount; i++)
{
Statics.currentProgressValue += 1;
dt.Rows.Add();
for (int j = 2; j < dt.Columns.Count; j++)
{
var cell = (workSheet.Rows().ElementAt(i).Cell(j));
if (!string.IsNullOrEmpty(cell.Value.ToString()))
dt.Rows[i - 1][j] = cell.Value.ToString().Trim();
else
dt.Rows[i - 1][j] = "";
}
if (Statics.cancelProgress == true)
break;
}
}
return dt;
}
}
}
catch (Exception ex)
{
Statics.cancelProgress = true;
throw new Exception("Error exporting data." +
Environment.NewLine + ex.Message);
}
return dt;
}
I can't resist optimizing the subroutines from Amurra's answer to remove need for Regex's.
The first function isn't actually needed since the second one can will accept a cell reference (C3) or a column name (C) (but still a nice helper function). The indices are also one-based (only because our implementation used one-based for the rows to match visually with Excel).
/// <summary>
/// Given a cell name, return the cell column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
/// <exception cref="ArgumentOutOfRangeException">cellReference</exception>
public static string GetColumnName(string cellReference)
{
// Advance from L to R until a number, then return 0 through previous position
//
for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
if (Char.IsNumber(cellReference[lastCharPos]))
return cellReference.Substring(0, lastCharPos);
throw new ArgumentOutOfRangeException("cellReference");
}
/// <summary>
/// Return one-based column index given a cell name or column name
/// </summary>
/// <param name="columnNameOrCellReference">Column Name (ie. A, AB3, or AB44)</param>
/// <returns>One based index if the conversion was successful; otherwise null</returns>
public static int GetColumnIndexFromName(string columnNameOrCellReference)
{
int columnIndex = 0;
int factor = 1;
for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--) // R to L
{
if (Char.IsLetter(columnNameOrCellReference[pos])) // for letters (columnName)
{
columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
factor *= 26;
}
}
return columnIndex;
}
Added yet another implementation, this time where the number of columns is known in advance:
/// <summary>
/// Gets a list cells that are padded with empty cells where necessary.
/// </summary>
/// <param name="numberOfColumns">The number of columns expected.</param>
/// <param name="cells">The cells.</param>
/// <returns>List of padded cells</returns>
private static IList<Cell> GetPaddedCells(int numberOfColumns, IList<Cell> cells)
{
// Only perform the padding operation if existing column count is less than required
if (cells.Count < numberOfColumns - 1)
{
IList<Cell> padded = new List<Cell>();
int cellIndex = 0;
for (int paddedIndex = 0; paddedIndex < numberOfColumns; paddedIndex++)
{
if (cellIndex < cells.Count)
{
// Grab column reference (ignore row) <seealso cref="https://stackoverflow.com/a/7316298/674776"/>
string columnReference = new string(cells[cellIndex].CellReference.ToString().Where(char.IsLetter).ToArray());
// Convert reference to index <seealso cref="https://stackoverflow.com/a/848552/674776"/>
int indexOfReference = columnReference.ToUpper().Aggregate(0, (column, letter) => (26 * column) + letter - 'A' + 1) - 1;
// Add padding cells where current cell index is less than required
while (indexOfReference > paddedIndex)
{
padded.Add(new Cell());
paddedIndex++;
}
padded.Add(cells[cellIndex++]);
}
else
{
// Add padding cells when passed existing cells
padded.Add(new Cell());
}
}
return padded;
}
else
{
return cells;
}
}
Call using:
IList<Cell> cells = GetPaddedCells(38, row.Descendants<Cell>().ToList());
Where 38 is the required number of columns.
To read blank cells, I am using a variable named "CN" assigned outside the row reader and in while loop, I am checking if column index is greater than or not from my variable as it is being incremented after each cell read. if this does not match, I am filling my column with value I want to. This is the trick I used to catch up the blank cells into my respecting column value. Here is the code:
public static DataTable ReadIntoDatatableFromExcel(string newFilePath)
{
/*Creating a table with 20 columns*/
var dt = CreateProviderRvenueSharingTable();
try
{
/*using stream so that if excel file is in another process then it can read without error*/
using (Stream stream = new FileStream(newFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(stream, false))
{
var workbookPart = spreadsheetDocument.WorkbookPart;
var workbook = workbookPart.Workbook;
/*get only unhide tabs*/
var sheets = workbook.Descendants<Sheet>().Where(e => e.State == null);
foreach (var sheet in sheets)
{
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
/*Remove empty sheets*/
List<Row> rows = worksheetPart.Worksheet.Elements<SheetData>().First().Elements<Row>()
.Where(r => r.InnerText != string.Empty).ToList();
if (rows.Count > 1)
{
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
int i = 0;
int BTR = 0;/*Break the reader while empty rows are found*/
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
/*ignoring first row with headers and check if data is there after header*/
if (i < 2)
{
i++;
continue;
}
reader.ReadFirstChild();
DataRow row = dt.NewRow();
int CN = 0;
if (reader.ElementType == typeof(Cell))
{
do
{
Cell c = (Cell)reader.LoadCurrentElement();
/*reader skipping blank cells so data is getting worng in datatable's rows according to header*/
if (CN != 0)
{
int cellColumnIndex =
ExcelHelper.GetColumnIndexFromName(
ExcelHelper.GetColumnName(c.CellReference));
if (cellColumnIndex < 20 && CN < cellColumnIndex - 1)
{
do
{
row[CN] = string.Empty;
CN++;
} while (CN < cellColumnIndex - 1);
}
}
/*stopping execution if first cell does not have any value which means empty row*/
if (CN == 0 && c.DataType == null && c.CellValue == null)
{
BTR++;
break;
}
string cellValue = GetCellValue(c, workbookPart);
row[CN] = cellValue;
CN++;
/*if any text exists after T column (index 20) then skip the reader*/
if (CN == 20)
{
break;
}
} while (reader.ReadNextSibling());
}
/*reader skipping blank cells so fill the array upto 19 index*/
while (CN != 0 && CN < 20)
{
row[CN] = string.Empty;
CN++;
}
if (CN == 20)
{
dt.Rows.Add(row);
}
}
/*escaping empty rows below data filled rows after checking 5 times */
if (BTR > 5)
break;
}
reader.Close();
}
}
}
}
}
catch (Exception ex)
{
throw ex;
}
return dt;
}
private static string GetCellValue(Cell c, WorkbookPart workbookPart)
{
string cellValue = string.Empty;
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
SharedStringItem ssi =
workbookPart.SharedStringTablePart.SharedStringTable
.Elements<SharedStringItem>()
.ElementAt(int.Parse(c.CellValue.InnerText));
if (ssi.Text != null)
{
cellValue = ssi.Text.Text;
}
}
else
{
if (c.CellValue != null)
{
cellValue = c.CellValue.InnerText;
}
}
return cellValue;
}
public static int GetColumnIndexFromName(string columnNameOrCellReference)
{
int columnIndex = 0;
int factor = 1;
for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--) // R to L
{
if (Char.IsLetter(columnNameOrCellReference[pos])) // for letters (columnName)
{
columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
factor *= 26;
}
}
return columnIndex;
}
public static string GetColumnName(string cellReference)
{
/* Advance from L to R until a number, then return 0 through previous position*/
for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
if (Char.IsNumber(cellReference[lastCharPos]))
return cellReference.Substring(0, lastCharPos);
throw new ArgumentOutOfRangeException("cellReference");
}
Code works for:
This code reads blank cells
skip empty rows after reading complete.
read the sheet from first in ascending order
if excel file is being used by another process, OpenXML still reads that.
Here is my solution. I found the above didn't seem to work well when the missing fields where at the end of a row.
Assuming the first row in the Excel sheet has ALL the columns (via headers), then grab the number of columns expected per row (row == 1). Then loop through the data rows (row > 1). The key to processing the missing cells is in method getRowCells, where the known number of column cells is passed in as well as the current row to process.
int columnCount = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex == 1).FirstOrDefault().Descendants<Cell>().Count();
IEnumerable<Row> rows = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex > 1);
List<List<string>> docData = new List<List<string>>();
foreach (Row row in rows)
{
List<Cell> cells = getRowCells(columnCount, row);
List<string> rowData = new List<string>();
foreach (Cell cell in cells)
{
rowData.Add(getCellValue(workbookPart, cell));
}
docData.Add(rowData);
}
Method getRowCells has a current limitation of only being able to support a sheet (row) that has less an 26 columns. A loop based on the known column count is used to find missing columns (cells). If found, a new Cell value is inserted into the cells collection, with the new Cell having a default value of "" instead of 'null'. The modified Cell collection is then returned.
private static List<Cell> getRowCells(int columnCount, Row row)
{
const string COLUMN_LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (columnCount > COLUMN_LETTERS.Length)
{
throw new ArgumentException(string.Format("Invalid columnCount ({0}). Cannot be greater than {1}",
columnCount, COLUMN_LETTERS.Length));
}
List<Cell> cells = row.Descendants<Cell>().ToList();
for (int i = 0; i < columnCount; i++)
{
if (i < cells.Count)
{
string cellColumnReference = cells.ElementAt(i).CellReference.ToString();
if (cellColumnReference[0] != COLUMN_LETTERS[i])
{
cells.Insert(i, new Cell() { CellValue = new CellValue("") }); }
}
else
{
cells.Insert(i, new Cell() { CellValue = new CellValue("") });
}
}
return cells;
}
private static string getCellValue(WorkbookPart workbookPart, Cell cell)
{
SharedStringTablePart stringTablePart = workbookPart.SharedStringTablePart;
string value = (cell.CellValue != null) ? cell.CellValue.InnerXml : string.Empty;
if ((cell.DataType != null) && (cell.DataType.Value == CellValues.SharedString))
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}
it run success with this code:
string filePath = "test.xlsx"//your file path
//Open the Excel file using ClosedXML.
using (XLWorkbook workBook = new XLWorkbook(filePath))
{
//Read the first Sheet from Excel file.
IXLWorksheet workSheet = workBook.Worksheet(1);
//Create a new DataTable.
DataTable dt = new DataTable();
//Loop through the Worksheet rows.
bool firstRow = true;
foreach (IXLRow row in workSheet.Rows())
{
//Use the first row to add columns to DataTable.
if (firstRow)
{
foreach (IXLCell cell in row.Cells())
{
dt.Columns.Add(cell.Value.ToString());
}
firstRow = false;
}
else
{
//Add rows to DataTable.
dt.Rows.Add();
int i = 0;
//for (IXLCell cell in row.Cells())
for (int j = 1; j <= dt.Columns.Count; j++)
{
if (string.IsNullOrEmpty(row.Cell(j).Value.ToString()))
dt.Rows[dt.Rows.Count - 1][i] = "";
else
dt.Rows[dt.Rows.Count - 1][i] =
row.Cell(j).Value.ToString();
i++;
}
}
}
}

Resources