i am currently tried to create a small program were the user enter a string in a text area, clicks on a button and the program counts the frequency of different characters in the string and shows the result on another text area.
E.g. Step 1:- User enter:- aaabbbbbbcccdd
Step 2:- User click the button
Step 3:- a 3
b 6
c 3
d 1
This is what I've done so far....
public partial class Form1 : Form
{
Dictionary<string, int> dic = new Dictionary<string, int>();
string s = "";
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
s = textBox1.Text;
int count = 0;
for (int i = 0; i < s.Length; i++ )
{
textBox2.Text = Convert.ToString(s[i]);
if (dic.Equals(s[i]))
{
count++;
}
else
{
dic.Add(Convert.ToString(s[i]), count++);
}
}
}
}
}
Any ideas or help how can I countinue because till now the program is giving a run time error when there are same charachter!!
Thank You
var lettersAndCounts = s.GroupBy(c=>c).Select(group => new {
Letter= group.Key,
Count = group.Count()
});
Instead of dic.Equals use dic.ContainsKey. However, i would use this little linq query:
Dictionary<string, int> dict = textBox1.Text
.GroupBy(c => c)
.ToDictionary(g => g.Key.ToString(), g => g.Count());
You are attempting to compare the entire dictionary to a string, that doesn't tell you if there is a key in the dictionary that corresponds to the string. As the dictionary never is equal to the string, your code will always think that it should add a new item even if one already exists, and that is the cause of the runtime error.
Use the ContainsKey method to check if the string exists as a key in the dictionary.
Instead of using a variable count, you would want to increase the numbers in the dictionary, and initialise new items with a count of one:
string key = s[i].ToString();
textBox2.Text = key;
if (dic.ContainsKey(key)) {
dic[key]++;
} else {
dic.Add(key, 1);
}
I'm going to suggest a different and somewhat simpler approach for doing this. Assuming you are using English strings, you can create an array with capacity = 26. Then depending on the character you encounter you would increment the appropriate index in the array. For example, if the character is 'a' increment count at index 0, if the character is 'b' increment the count at index 1, etc...
Your implementation will look something like this:
int count[] = new int [26] {0};
for(int i = 0; i < s.length; i++)
{
count[Char.ToLower(s[i]) - int('a')]++;
}
When this finishes you will have the number of 'a's in count[0] and the number of 'z's in count[25].
I have recently come across with this problem,
you have to find an integer from a sorted two dimensional array. But the two dim array is sorted in rows not in columns. I have solved the problem but still thinking that there may be some better approach. So I have come here to discuss with all of you. Your suggestions and improvement will help me to grow in coding. here is the code
int searchInteger = Int32.Parse(Console.ReadLine());
int cnt = 0;
for (int i = 0; i < x; i++)
{
if (intarry[i, 0] <= searchInteger && intarry[i,y-1] >= searchInteger)
{
if (intarry[i, 0] == searchInteger || intarry[i, y - 1] == searchInteger)
Console.WriteLine("string present {0} times" , ++cnt);
else
{
int[] array = new int[y];
int y1 = 0;
for (int k = 0; k < y; k++)
array[k] = intarry[i, y1++];
bool result;
if (result = binarySearch(array, searchInteger) == true)
{
Console.WriteLine("string present inside {0} times", ++ cnt);
Console.ReadLine();
}
}
}
}
Where searchInteger is the integer we have to find in the array. and binary search is the methiod which is returning boolean if the value is present in the single dimension array (in that single row).
please help, is it optimum or there are better solution than this.
Thanks
Provided you have declared the array intarry, x and y as follows:
int[,] intarry =
{
{0,7,2},
{3,4,5},
{6,7,8}
};
var y = intarry.GetUpperBound(0)+1;
var x = intarry.GetUpperBound(1)+1;
// intarry.Dump();
You can keep it as simple as:
int searchInteger = Int32.Parse(Console.ReadLine());
var cnt=0;
for(var r=0; r<y; r++)
{
for(var c=0; c<x; c++)
{
if (intarry[r, c].Equals(searchInteger))
{
cnt++;
Console.WriteLine(
"string present at position [{0},{1}]" , r, c);
} // if
} // for
} // for
Console.WriteLine("string present {0} times" , cnt);
This example assumes that you don't have any information whether the array is sorted or not (which means: if you don't know if it is sorted you have to go through every element and can't use binary search). Based on this example you can refine the performance, if you know more how the data in the array is structured:
if the rows are sorted ascending, you can replace the inner for loop by a binary search
if the entire array is sorted ascending and the data does not repeat, e.g.
int[,] intarry = {{0,1,2}, {3,4,5}, {6,7,8}};
then you can exit the loop as soon as the item is found. The easiest way to do this to create
a function and add a return statement to the inner for loop.
I am using the accepted solution here to convert an excel sheet into a datatable. This works fine if I have "perfect" data but if I have a blank cell in the middle of my data it seems to put the wrong data in each column.
I think this is because in the below code:
row.Descendants<Cell>().Count()
is number of populated cells (not all columns) AND:
GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
seems to find the next populated cell (not necessarily what is in that index) so if the first column is empty and i call ElementAt(0), it returns the value in the second column.
Here is the full parsing code.
DataRow tempRow = dt.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
{
Console.Write(tempRow[i].ToString());
}
}
This makes sense since Excel will not store a value for a cell that is null. If you open your file using the Open XML SDK 2.0 Productivity Tool and traverse the XML down to the cell level you will see that only the cells that have data are going to be in that file.
Your options are to insert blank data in the range of cells you are going to traverse or programmatically figure out a cell was skipped and adjust your index appropriately.
I made an example excel document with a string in cell reference A1 and C1. I then opened up the excel document in the Open XML Productivity Tool and here is the XML that was stored:
<x:row r="1" spans="1:3"
xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:c r="A1" t="s">
<x:v>0</x:v>
</x:c>
<x:c r="C1" t="s">
<x:v>1</x:v>
</x:c>
</x:row>
Here you will see that the data corresponds to the first row and that only two cells worth of data are saved for that row. The data saved corresponds to A1 and C1 and that no cells with null values are saved.
To get the functionality that you need, you can traverse over the Cells as you are doing above, but you will need to check what the value the Cell is referencing and determine if any Cells have been skipped. to do that you will need two utility functions to get the Column Name from the cell reference and to then translate that column name into a zero based index:
private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };
/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
// Create a regular expression to match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
}
/// <summary>
/// Given just the column name (no row index), it will return the zero based column index.
/// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ).
/// A length of three can be implemented when needed.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful; otherwise null</returns>
public static int? GetColumnIndexFromName(string columnName)
{
int? columnIndex = null;
string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();
if (colLetters.Count() <= 2)
{
int index = 0;
foreach (string col in colLetters)
{
List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList();
int? indexValue = Letters.IndexOf(col1.ElementAt(index));
if (indexValue != -1)
{
// The first letter of a two digit column needs some extra calculations
if (index == 0 && colLetters.Count() == 2)
{
columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
}
else
{
columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
}
}
index++;
}
}
return columnIndex;
}
Then you can iterate over the Cells and check to see what the cell reference is compared to the columnIndex. If it is less than then you add blank data to your tempRow, otherwise just read in the value contained in the cell. (Note: I did not test the code below, but the general idea should help):
DataRow tempRow = dt.NewRow();
int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
// Gets the column index of the cell with data
int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));
if (columnIndex < cellColumnIndex)
{
do
{
tempRow[columnIndex] = //Insert blank data here;
columnIndex++;
}
while(columnIndex < cellColumnIndex);
}
tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);
if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
{
Console.Write(tempRow[i].ToString());
}
columnIndex++;
}
Here's a slightly modified version of Waylon's answer which also relied on other answers. It encapsulates his method in a class.
I changed
IEnumerator<Cell> GetEnumerator()
to
IEnumerable<Cell> GetRowCells(Row row)
Here's the class, you don't need to instantiate it, it just serves as an utility class:
public class SpreedsheetHelper
{
///<summary>returns an empty cell when a blank cell is encountered
///</summary>
public static IEnumerable<Cell> GetRowCells(Row row)
{
int currentCount = 0;
foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
{
string columnName = GetColumnName(cell.CellReference);
int currentColumnIndex = ConvertColumnNameToNumber(columnName);
for (; currentCount < currentColumnIndex; currentCount++)
{
yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
}
yield return cell;
currentCount++;
}
}
/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
// Match the column name portion of the cell name.
var regex = new System.Text.RegularExpressions.Regex("[A-Za-z]+");
var match = regex.Match(cellReference);
return match.Value;
}
/// <summary>
/// Given just the column name (no row index),
/// it will return the zero based column index.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful</returns>
/// <exception cref="ArgumentException">thrown if the given string
/// contains characters other than uppercase letters</exception>
public static int ConvertColumnNameToNumber(string columnName)
{
var alpha = new System.Text.RegularExpressions.Regex("^[A-Z]+$");
if (!alpha.IsMatch(columnName)) throw new ArgumentException();
char[] colLetters = columnName.ToCharArray();
Array.Reverse(colLetters);
int convertedValue = 0;
for (int i = 0; i < colLetters.Length; i++)
{
char letter = colLetters[i];
int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
convertedValue += current * (int)Math.Pow(26, i);
}
return convertedValue;
}
}
Now you're able to get all rows' cells in this way:
// skip the part that retrieves the worksheet sheetData
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach(Row row in rows)
{
IEnumerable<Cell> cells = SpreedsheetHelper.GetRowCells(row);
foreach (Cell cell in cells)
{
// skip part that reads the text according to the cell-type
}
}
It will contain all cells even if they are empty.
Here's an implementation of IEnumerable that should do what you want, compiled and unit tested.
///<summary>returns an empty cell when a blank cell is encountered
///</summary>
public IEnumerator<Cell> GetEnumerator()
{
int currentCount = 0;
// row is a class level variable representing the current
// DocumentFormat.OpenXml.Spreadsheet.Row
foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
{
string columnName = GetColumnName(cell.CellReference);
int currentColumnIndex = ConvertColumnNameToNumber(columnName);
for ( ; currentCount < currentColumnIndex; currentCount++)
{
yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
}
yield return cell;
currentCount++;
}
}
Here are the functions it relies on:
/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
// Match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
}
/// <summary>
/// Given just the column name (no row index),
/// it will return the zero based column index.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful</returns>
/// <exception cref="ArgumentException">thrown if the given string
/// contains characters other than uppercase letters</exception>
public static int ConvertColumnNameToNumber(string columnName)
{
Regex alpha = new Regex("^[A-Z]+$");
if (!alpha.IsMatch(columnName)) throw new ArgumentException();
char[] colLetters = columnName.ToCharArray();
Array.Reverse(colLetters);
int convertedValue = 0;
for (int i = 0; i < colLetters.Length; i++)
{
char letter = colLetters[i];
int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
convertedValue += current * (int)Math.Pow(26, i);
}
return convertedValue;
}
Throw it in a class and give it a try.
See my implementation:
Row[] rows = worksheet.GetFirstChild<SheetData>()
.Elements<Row>()
.ToArray();
string[] columnNames = rows.First()
.Elements<Cell>()
.Select(cell => GetCellValue(cell, document))
.ToArray();
HeaderLetters = ExcelHeaderHelper.GetHeaderLetters((uint)columnNames.Count());
if (columnNames.Count() != HeaderLetters.Count())
{
throw new ArgumentException("HeaderLetters");
}
IEnumerable<List<string>> cellValues = GetCellValues(rows.Skip(1), columnNames.Count(), document);
//Here you can enumerate through the cell values, based on the cell index the column names can be retrieved.
HeaderLetters are collected using this class:
private static class ExcelHeaderHelper
{
public static string[] GetHeaderLetters(uint max)
{
var result = new List<string>();
int i = 0;
var columnPrefix = new Queue<string>();
string prefix = null;
int prevRoundNo = 0;
uint maxPrefix = max / 26;
while (i < max)
{
int roundNo = i / 26;
if (prevRoundNo < roundNo)
{
prefix = columnPrefix.Dequeue();
prevRoundNo = roundNo;
}
string item = prefix + ((char)(65 + (i % 26))).ToString(CultureInfo.InvariantCulture);
if (i <= maxPrefix)
{
columnPrefix.Enqueue(item);
}
result.Add(item);
i++;
}
return result.ToArray();
}
}
And the helper methods are:
private static IEnumerable<List<string>> GetCellValues(IEnumerable<Row> rows, int columnCount, SpreadsheetDocument document)
{
var result = new List<List<string>>();
foreach (var row in rows)
{
List<string> cellValues = new List<string>();
var actualCells = row.Elements<Cell>().ToArray();
int j = 0;
for (int i = 0; i < columnCount; i++)
{
if (actualCells.Count() <= j || !actualCells[j].CellReference.ToString().StartsWith(HeaderLetters[i]))
{
cellValues.Add(null);
}
else
{
cellValues.Add(GetCellValue(actualCells[j], document));
j++;
}
}
result.Add(cellValues);
}
return result;
}
private static string GetCellValue(Cell cell, SpreadsheetDocument document)
{
bool sstIndexedcell = GetCellType(cell);
return sstIndexedcell
? GetSharedStringItemById(document.WorkbookPart, Convert.ToInt32(cell.InnerText))
: cell.InnerText;
}
private static bool GetCellType(Cell cell)
{
return cell.DataType != null && cell.DataType == CellValues.SharedString;
}
private static string GetSharedStringItemById(WorkbookPart workbookPart, int id)
{
return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id).InnerText;
}
The solution deals with shared cell items (SST indexed cells).
All good examples. Here is the one I am using since I need to keep track of all rows, cells, values, and titles for correlation and analysis.
The method ReadSpreadsheet opens an xlxs file and goes through each worksheet, row, and column. Since the values are stored in a referenced string table, I also explicitly use that per worksheet. There are other classes used: DSFunction and StaticVariables. The latter holds oft used parameter values, such as the referenced 'quotdouble' ( quotdouble = "\u0022"; ) and 'crlf' (crlf = "\u000D" + "\u000A"; ).
The relevant DSFunction method GetIntColIndexForLetter is included below. It returns an integer value for the column index corresponding to letter names such as (A,B, AA, ADE, etc.). This is used along with the parameter 'ncellcolref' to determine if any columns have been skipped and to enter empty string values for each one that is missing.
I also do some cleaning of the values before storing temporarily in a List object (using Replace method).
Subsequently, I use the hash table (Dictionary) of column names to extract values across different worksheets, correlate them, create normalized values, and then create an object used in our product which is then stored as an XML file. None of this is shown but is why this approach is used.
public static class DSFunction {
/// <summary>
/// Creates an integer value for a column letter name starting at 1 for 'a'
/// </summary>
/// <param name="lettstr">Column name as letters</param>
/// <returns>int value</returns>
public static int GetIntColIndexForLetter(string lettstr) {
string txt = "", txt1="";
int n1, result = 0, nbeg=-1, nitem=0;
try {
nbeg = (int)("a".ToCharArray()[0]) - 1; //1 based
txt = lettstr;
if (txt != "") txt = txt.ToLower().Trim();
while (txt != "") {
if (txt.Length > 1) {
txt1 = txt.Substring(0, 1);
txt = txt.Substring(1);
}
else {
txt1 = txt;
txt = "";
}
if (!DSFunction.IsNumberString(txt1, "real")) {
nitem++;
n1 = (int)(txt1.ToCharArray()[0]) - nbeg;
result += n1 + (nitem - 1) * 26;
}
else {
break;
}
}
}
catch (Exception ex) {
txt = ex.Message;
}
return result;
}
}
public static class Extractor {
public static string ReadSpreadsheet(string fileUri) {
string msg = "", txt = "", txt1 = "";
int i, n1, n2, nrow = -1, ncell = -1, ncellcolref = -1;
Boolean haveheader = true;
Dictionary<string, int> hashcolnames = new Dictionary<string, int>();
List<string> colvalues = new List<string>();
try {
if (!File.Exists(fileUri)) { throw new Exception("file does not exist"); }
using (SpreadsheetDocument ssdoc = SpreadsheetDocument.Open(fileUri, true)) {
var stringTable = ssdoc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
foreach (Sheet sht in ssdoc.WorkbookPart.Workbook.Descendants<Sheet>()) {
nrow = 0;
foreach (Row ssrow in ((WorksheetPart)(ssdoc.WorkbookPart.GetPartById(sht.Id))).Worksheet.Descendants<Row>()) {
ncell = 0;
ncellcolref = 0;
nrow++;
colvalues.Clear();
foreach (Cell sscell in ssrow.Elements<Cell>()) {
ncell++;
n1 = DSFunction.GetIntColIndexForLetter(sscell.CellReference);
for (i = 0; i < (n1 - ncellcolref - 1); i++) {
if (nrow == 1 && haveheader) {
txt1 = "-missing" + (ncellcolref + 1 + i).ToString() + "-";
if (!hashcolnames.TryGetValue(txt1, out n2)) {
hashcolnames.Add(txt1, ncell - 1);
}
}
else {
colvalues.Add("");
}
}
ncellcolref = n1;
if (sscell.DataType != null) {
if (sscell.DataType.Value == CellValues.SharedString && stringTable != null) {
txt = stringTable.SharedStringTable.ElementAt(int.Parse(sscell.InnerText)).InnerText;
}
else if (sscell.DataType.Value == CellValues.String) {
txt = sscell.InnerText;
}
else txt = sscell.InnerText.ToString();
}
else txt = sscell.InnerText;
if (txt != "") txt1 = txt.ToLower().Trim(); else txt1 = "";
if (nrow == 1 && haveheader) {
txt1 = txt1.Replace(" ", "");
if (txt1 == "table/viewname") txt1 = "tablename";
else if (txt1 == "schemaownername") txt1 = "schemaowner";
else if (txt1 == "subjectareaname") txt1 = "subjectarea";
else if (txt1.StartsWith("column")) {
txt1 = txt1.Substring("column".Length);
}
if (!hashcolnames.TryGetValue(txt1, out n1)) {
hashcolnames.Add(txt1, ncell - 1);
}
}
else {
txt = txt.Replace(((char)8220).ToString(), "'"); //special "
txt = txt.Replace(((char)8221).ToString(), "'"); //special "
txt = txt.Replace(StaticVariables.quotdouble, "'");
txt = txt.Replace(StaticVariables.crlf, " ");
txt = txt.Replace(" ", " ");
txt = txt.Replace("<", "");
txt = txt.Replace(">", "");
colvalues.Add(txt);
}
}
}
}
}
}
catch (Exception ex) {
msg = "notok:" + ex.Message;
}
return msg;
}
}
The letter code is a base 26 encoding so this should work to convert it into an offset.
// Converts letter code (i.e. AA) to an offset
public int offset( string code)
{
var offset = 0;
var byte_array = Encoding.ASCII.GetBytes( code ).Reverse().ToArray();
for( var i = 0; i < byte_array.Length; i++ )
{
offset += (byte_array[i] - 65 + 1) * Convert.ToInt32(Math.Pow(26.0, Convert.ToDouble(i)));
}
return offset - 1;
}
You can use this function to extract a cell from a row passing the header index:
public static Cell GetCellFromRow(Row r ,int headerIdx) {
string cellname = GetNthColumnName(headerIdx) + r.RowIndex.ToString();
IEnumerable<Cell> cells = r.Elements<Cell>().Where(x=> x.CellReference == cellname);
if (cells.Count() > 0)
{
return cells.First();
}
else {
return null;
}
}
public static string GetNthColumnName(int n)
{
string name = "";
while (n > 0)
{
n--;
name = (char)('A' + n % 26) + name;
n /= 26;
}
return name;
}
Okay, I'm not exactly an expert on this but the other answers do seem like over kill to me so here's my solution:
// Loop through each row in the spreadsheet, skipping the header row
foreach (var row in sheetData.Elements<Row>().Skip(1))
{
var i = 0;
string[] letters = new string[15] {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O" };
List<String> cellsList = new List<string>();
foreach (var cell in row.Elements<Cell>().ToArray())
{
while (cell.CellReference.ToString()[0] != Convert.ToChar(letters[i]))
{//accounts for multiple consecutive blank cells
cellsList.Add("");
i++;
}
cellsList.Add(cell.CellValue.Text);
i++;
}
string[] cells = cellsList.ToArray();
foreach(var cell in cellsList)
{
//display contents of cell, depending on the datatype you may need to call each of the cells manually
}
}
Hope someone finds this useful!
With apologies for posting yet another answer to this question, here's the code I used.
I was having problems with OpenXML not working properly if a worksheet had a blank row at the top. It would sometimes just return a DataTable with 0 rows and 0 columns in it. The code below copes with this, and all other worksheets.
Here's how you would call my code. Just pass in a filename and the name of the Worksheet to read in:
DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");
And here's the code itself:
public class OpenXMLHelper
{
// A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one
// of the worksheets.
//
// We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers,
// OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more
// stable method of reading in the data.
//
public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
{
DataTable dt = new DataTable(worksheetName);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
{
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
if (theSheet == null)
throw new Exception("Couldn't find the worksheet: " + worksheetName);
// Retrieve a reference to the worksheet part.
WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
Worksheet workSheet = wsPart.Worksheet;
string dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4"
int numOfColumns = 0;
int numOfRows = 0;
CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
string[,] cellValues = new string[numOfColumns, numOfRows];
int colInx = 0;
int rowInx = 0;
string value = "";
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
// Iterate through each row of OpenXML data, and store each cell's value in the appropriate slot in our [,] string array.
foreach (Row row in rows)
{
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
// *DON'T* assume there's going to be one XML element for each column in each row...
Cell cell = row.Descendants<Cell>().ElementAt(i);
if (cell.CellValue == null || cell.CellReference == null)
continue; // eg when an Excel cell contains a blank string
// Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based)
rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1; // Needs to be 0-based
// Fetch the value in this cell
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
cellValues[colInx, rowInx] = value;
}
}
// Copy the array of strings into a DataTable.
// We don't (currently) make any attempt to work out which columns should be numeric, rather than string.
for (int col = 0; col < numOfColumns; col++)
dt.Columns.Add("Column_" + col.ToString());
for (int row = 0; row < numOfRows; row++)
{
DataRow dataRow = dt.NewRow();
for (int col = 0; col < numOfColumns; col++)
{
dataRow.SetField(col, cellValues[col, row]);
}
dt.Rows.Add(dataRow);
}
#if DEBUG
// Write out the contents of our DataTable to the Output window (for debugging)
string str = "";
for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
{
for (colInx = 0; colInx < maxNumOfColumns; colInx++)
{
object val = dt.Rows[rowInx].ItemArray[colInx];
str += (val == null) ? "" : val.ToString();
str += "\t";
}
str += "\n";
}
System.Diagnostics.Trace.WriteLine(str);
#endif
return dt;
}
}
private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
{
// How many columns & rows of data does this Worksheet contain ?
// We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
// eg "B1:F4" -> we'll need 6 columns and 4 rows.
//
// (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
try
{
string[] parts = dimensions.Split(':'); // eg "B1:F4"
if (parts.Length != 2)
throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns
numOfRows = GetRowIndexFromCellAddress(parts[1]);
}
catch
{
throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
}
}
public static int GetRowIndexFromCellAddress(string cellAddress)
{
// Convert an Excel CellReference column into a 1-based row index
// eg "D42" -> 42
// "F123" -> 123
string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
return int.Parse(rowNumber);
}
public static int GetColumnIndexByName(string cellAddress)
{
// Convert an Excel CellReference column into a 0-based column index
// eg "D42" -> 3
// "F123" -> 5
var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
int number = 0, pow = 1;
for (int i = columnName.Length - 1; i >= 0; i--)
{
number += (columnName[i] - 'A' + 1) * pow;
pow *= 26;
}
return number - 1;
}
}
Using ClosedXML.Excel Instead of OpenXML:
public DataTable ImportTable(DataTable dt, string FileName)
{
Statics.currentProgressValue = 0;
Statics.maxProgressValue = 100;
Statics.cancelProgress = false;
try
{
bool fileExist = File.Exists(FileName);
if (fileExist)
{
using (XLWorkbook workBook = new XLWorkbook(FileName))
{
IXLWorksheet workSheet = workBook.Worksheet(1);
var rowCount = workSheet.RangeUsed().RowCount();
if (rowCount > 0)
{
var colCount = workSheet.Row(1).CellsUsed().Count();
if (dt.Columns.Count < colCount)
throw new Exception($"Expects at least {dt.Columns.Count} columns.");
//Loop through the Worksheet rows.
Statics.maxProgressValue = rowCount;
for (int i = 1; i < rowCount; i++)
{
Statics.currentProgressValue += 1;
dt.Rows.Add();
for (int j = 2; j < dt.Columns.Count; j++)
{
var cell = (workSheet.Rows().ElementAt(i).Cell(j));
if (!string.IsNullOrEmpty(cell.Value.ToString()))
dt.Rows[i - 1][j] = cell.Value.ToString().Trim();
else
dt.Rows[i - 1][j] = "";
}
if (Statics.cancelProgress == true)
break;
}
}
return dt;
}
}
}
catch (Exception ex)
{
Statics.cancelProgress = true;
throw new Exception("Error exporting data." +
Environment.NewLine + ex.Message);
}
return dt;
}
I can't resist optimizing the subroutines from Amurra's answer to remove need for Regex's.
The first function isn't actually needed since the second one can will accept a cell reference (C3) or a column name (C) (but still a nice helper function). The indices are also one-based (only because our implementation used one-based for the rows to match visually with Excel).
/// <summary>
/// Given a cell name, return the cell column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
/// <exception cref="ArgumentOutOfRangeException">cellReference</exception>
public static string GetColumnName(string cellReference)
{
// Advance from L to R until a number, then return 0 through previous position
//
for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
if (Char.IsNumber(cellReference[lastCharPos]))
return cellReference.Substring(0, lastCharPos);
throw new ArgumentOutOfRangeException("cellReference");
}
/// <summary>
/// Return one-based column index given a cell name or column name
/// </summary>
/// <param name="columnNameOrCellReference">Column Name (ie. A, AB3, or AB44)</param>
/// <returns>One based index if the conversion was successful; otherwise null</returns>
public static int GetColumnIndexFromName(string columnNameOrCellReference)
{
int columnIndex = 0;
int factor = 1;
for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--) // R to L
{
if (Char.IsLetter(columnNameOrCellReference[pos])) // for letters (columnName)
{
columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
factor *= 26;
}
}
return columnIndex;
}
Added yet another implementation, this time where the number of columns is known in advance:
/// <summary>
/// Gets a list cells that are padded with empty cells where necessary.
/// </summary>
/// <param name="numberOfColumns">The number of columns expected.</param>
/// <param name="cells">The cells.</param>
/// <returns>List of padded cells</returns>
private static IList<Cell> GetPaddedCells(int numberOfColumns, IList<Cell> cells)
{
// Only perform the padding operation if existing column count is less than required
if (cells.Count < numberOfColumns - 1)
{
IList<Cell> padded = new List<Cell>();
int cellIndex = 0;
for (int paddedIndex = 0; paddedIndex < numberOfColumns; paddedIndex++)
{
if (cellIndex < cells.Count)
{
// Grab column reference (ignore row) <seealso cref="https://stackoverflow.com/a/7316298/674776"/>
string columnReference = new string(cells[cellIndex].CellReference.ToString().Where(char.IsLetter).ToArray());
// Convert reference to index <seealso cref="https://stackoverflow.com/a/848552/674776"/>
int indexOfReference = columnReference.ToUpper().Aggregate(0, (column, letter) => (26 * column) + letter - 'A' + 1) - 1;
// Add padding cells where current cell index is less than required
while (indexOfReference > paddedIndex)
{
padded.Add(new Cell());
paddedIndex++;
}
padded.Add(cells[cellIndex++]);
}
else
{
// Add padding cells when passed existing cells
padded.Add(new Cell());
}
}
return padded;
}
else
{
return cells;
}
}
Call using:
IList<Cell> cells = GetPaddedCells(38, row.Descendants<Cell>().ToList());
Where 38 is the required number of columns.
To read blank cells, I am using a variable named "CN" assigned outside the row reader and in while loop, I am checking if column index is greater than or not from my variable as it is being incremented after each cell read. if this does not match, I am filling my column with value I want to. This is the trick I used to catch up the blank cells into my respecting column value. Here is the code:
public static DataTable ReadIntoDatatableFromExcel(string newFilePath)
{
/*Creating a table with 20 columns*/
var dt = CreateProviderRvenueSharingTable();
try
{
/*using stream so that if excel file is in another process then it can read without error*/
using (Stream stream = new FileStream(newFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(stream, false))
{
var workbookPart = spreadsheetDocument.WorkbookPart;
var workbook = workbookPart.Workbook;
/*get only unhide tabs*/
var sheets = workbook.Descendants<Sheet>().Where(e => e.State == null);
foreach (var sheet in sheets)
{
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
/*Remove empty sheets*/
List<Row> rows = worksheetPart.Worksheet.Elements<SheetData>().First().Elements<Row>()
.Where(r => r.InnerText != string.Empty).ToList();
if (rows.Count > 1)
{
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
int i = 0;
int BTR = 0;/*Break the reader while empty rows are found*/
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
/*ignoring first row with headers and check if data is there after header*/
if (i < 2)
{
i++;
continue;
}
reader.ReadFirstChild();
DataRow row = dt.NewRow();
int CN = 0;
if (reader.ElementType == typeof(Cell))
{
do
{
Cell c = (Cell)reader.LoadCurrentElement();
/*reader skipping blank cells so data is getting worng in datatable's rows according to header*/
if (CN != 0)
{
int cellColumnIndex =
ExcelHelper.GetColumnIndexFromName(
ExcelHelper.GetColumnName(c.CellReference));
if (cellColumnIndex < 20 && CN < cellColumnIndex - 1)
{
do
{
row[CN] = string.Empty;
CN++;
} while (CN < cellColumnIndex - 1);
}
}
/*stopping execution if first cell does not have any value which means empty row*/
if (CN == 0 && c.DataType == null && c.CellValue == null)
{
BTR++;
break;
}
string cellValue = GetCellValue(c, workbookPart);
row[CN] = cellValue;
CN++;
/*if any text exists after T column (index 20) then skip the reader*/
if (CN == 20)
{
break;
}
} while (reader.ReadNextSibling());
}
/*reader skipping blank cells so fill the array upto 19 index*/
while (CN != 0 && CN < 20)
{
row[CN] = string.Empty;
CN++;
}
if (CN == 20)
{
dt.Rows.Add(row);
}
}
/*escaping empty rows below data filled rows after checking 5 times */
if (BTR > 5)
break;
}
reader.Close();
}
}
}
}
}
catch (Exception ex)
{
throw ex;
}
return dt;
}
private static string GetCellValue(Cell c, WorkbookPart workbookPart)
{
string cellValue = string.Empty;
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
SharedStringItem ssi =
workbookPart.SharedStringTablePart.SharedStringTable
.Elements<SharedStringItem>()
.ElementAt(int.Parse(c.CellValue.InnerText));
if (ssi.Text != null)
{
cellValue = ssi.Text.Text;
}
}
else
{
if (c.CellValue != null)
{
cellValue = c.CellValue.InnerText;
}
}
return cellValue;
}
public static int GetColumnIndexFromName(string columnNameOrCellReference)
{
int columnIndex = 0;
int factor = 1;
for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--) // R to L
{
if (Char.IsLetter(columnNameOrCellReference[pos])) // for letters (columnName)
{
columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
factor *= 26;
}
}
return columnIndex;
}
public static string GetColumnName(string cellReference)
{
/* Advance from L to R until a number, then return 0 through previous position*/
for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
if (Char.IsNumber(cellReference[lastCharPos]))
return cellReference.Substring(0, lastCharPos);
throw new ArgumentOutOfRangeException("cellReference");
}
Code works for:
This code reads blank cells
skip empty rows after reading complete.
read the sheet from first in ascending order
if excel file is being used by another process, OpenXML still reads that.
Here is my solution. I found the above didn't seem to work well when the missing fields where at the end of a row.
Assuming the first row in the Excel sheet has ALL the columns (via headers), then grab the number of columns expected per row (row == 1). Then loop through the data rows (row > 1). The key to processing the missing cells is in method getRowCells, where the known number of column cells is passed in as well as the current row to process.
int columnCount = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex == 1).FirstOrDefault().Descendants<Cell>().Count();
IEnumerable<Row> rows = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex > 1);
List<List<string>> docData = new List<List<string>>();
foreach (Row row in rows)
{
List<Cell> cells = getRowCells(columnCount, row);
List<string> rowData = new List<string>();
foreach (Cell cell in cells)
{
rowData.Add(getCellValue(workbookPart, cell));
}
docData.Add(rowData);
}
Method getRowCells has a current limitation of only being able to support a sheet (row) that has less an 26 columns. A loop based on the known column count is used to find missing columns (cells). If found, a new Cell value is inserted into the cells collection, with the new Cell having a default value of "" instead of 'null'. The modified Cell collection is then returned.
private static List<Cell> getRowCells(int columnCount, Row row)
{
const string COLUMN_LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (columnCount > COLUMN_LETTERS.Length)
{
throw new ArgumentException(string.Format("Invalid columnCount ({0}). Cannot be greater than {1}",
columnCount, COLUMN_LETTERS.Length));
}
List<Cell> cells = row.Descendants<Cell>().ToList();
for (int i = 0; i < columnCount; i++)
{
if (i < cells.Count)
{
string cellColumnReference = cells.ElementAt(i).CellReference.ToString();
if (cellColumnReference[0] != COLUMN_LETTERS[i])
{
cells.Insert(i, new Cell() { CellValue = new CellValue("") }); }
}
else
{
cells.Insert(i, new Cell() { CellValue = new CellValue("") });
}
}
return cells;
}
private static string getCellValue(WorkbookPart workbookPart, Cell cell)
{
SharedStringTablePart stringTablePart = workbookPart.SharedStringTablePart;
string value = (cell.CellValue != null) ? cell.CellValue.InnerXml : string.Empty;
if ((cell.DataType != null) && (cell.DataType.Value == CellValues.SharedString))
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}
it run success with this code:
string filePath = "test.xlsx"//your file path
//Open the Excel file using ClosedXML.
using (XLWorkbook workBook = new XLWorkbook(filePath))
{
//Read the first Sheet from Excel file.
IXLWorksheet workSheet = workBook.Worksheet(1);
//Create a new DataTable.
DataTable dt = new DataTable();
//Loop through the Worksheet rows.
bool firstRow = true;
foreach (IXLRow row in workSheet.Rows())
{
//Use the first row to add columns to DataTable.
if (firstRow)
{
foreach (IXLCell cell in row.Cells())
{
dt.Columns.Add(cell.Value.ToString());
}
firstRow = false;
}
else
{
//Add rows to DataTable.
dt.Rows.Add();
int i = 0;
//for (IXLCell cell in row.Cells())
for (int j = 1; j <= dt.Columns.Count; j++)
{
if (string.IsNullOrEmpty(row.Cell(j).Value.ToString()))
dt.Rows[dt.Rows.Count - 1][i] = "";
else
dt.Rows[dt.Rows.Count - 1][i] =
row.Cell(j).Value.ToString();
i++;
}
}
}
}