"At least one object must implement IComparable" exception from LINQ query results - excel

I have not used LINQ very extensively, but I'm trying to read data from a large Excel spreadsheet (14K+ rows) that requires me to make queries from multiple worksheets and even requery the original spreadsheet to filter specific data. Because OleDb queries of Excel can take a relatively long time (500+ms per query for a file on my local machine), I'm doing a couple of these queries at the front of my method, starting a loop through a "base" DataTable, then trying to use LINQ to filter down the data within that loop to put the appropriate data into a more structured DataSet. Here is some code to help explain (VB.NET):
Dim Connection As System.Data.OleDb.OleDbConnection
Dim Command As System.Data.OleDb.OleDbDataAdapter
Dim EXCEL_SHEET_DATA_1 As New DataTable
Dim EXCEL_SHEET_DATA_2 As New DataTable
Dim EXCEL_SHEET_DATA_3 As New DataTable
Dim TapeFile As New FileInfo("C:\TempFolder\tapefile.xls")
Connection = New System.Data.OleDb.OleDbConnection("provider=Microsoft.Jet.OLEDB.4.0; Data Source='" & TapeFile.FullName & "'; Extended Properties=Excel 8.0;")
Command = New System.Data.OleDb.OleDbDataAdapter("SELECT * FROM [SHEET1$] ORDER BY [USER_ID] ASC, [MEMBER_NUMBER] ASC;", Connection)
Command.Fill(EXCEL_SHEET_DATA_1)
Command.Dispose()
Command = New System.Data.OleDb.OleDbDataAdapter("SELECT * FROM [SHEET2$] ORDER BY [USER_ID] ASC, [MEMBER_NUMBER] ASC;", Connection)
Command.Fill(EXCEL_SHEET_DATA_2)
Command.Dispose()
Command = New System.Data.OleDb.OleDbDataAdapter("SELECT * FROM [SHEET3$] ORDER BY [USER_ID] ASC, [MEMBER_NUMBER] ASC;", Connection)
Command.Fill(EXCEL_SHEET_DATA_3)
Command.Dispose()
For Each Row As DataRow In EXCEL_SHEET_DATA_1.Rows
Dim MemberNumber As String = Row("MEMBER_NUMBER").ToString.Trim
Dim UserNumber As String = Row("USER_ID").ToString.Trim
' -- CODE FOR INITIAL PROCESSING OF SHEET1 DATA - NO ERRORS --
Dim CoMemberQuery As IEnumerable(Of DataRow) = From cm In EXCEL_SHEET_DATA_2 Where cm("MEMBER_NUMBER") = MemberNumber And cm("USER_ID") = UserNumber
For Each CoMemberRow As DataRow In CoMemberQuery
' -- CODE FOR PROCESSING OF SHEET2 DATA - NO ERRORS --
Next CoMemberRow
Dim VehicleQuery As IEnumerable(Of DataRow) = From veh In EXCEL_SHEET_DATA_1 Where veh("MEMBER_NUMBER") = MemberNumber And veh("USER_ID") = UserNumber Order By veh("VIN") Ascending
' *******************************************************
' -->> HERE IS WHERE I *SOMETIMES* GET THE EXCEPTION <<--
' *******************************************************
For Each VehicleRow As DataRow In VehicleQuery
' -- CODE FOR SECONDARY PROCESSING OF SHEET1 DATA - NO ERRORS --
Next VehicleRow
Next Row
I don't get the exception every time. The only thing I've noticed as possibly having something to do with it is that for the specific MemberNumber and UserNumber combination that causes the first exception, the first row in the result set would most likely contain a NULL value for the VIN field.
I'm sure the problem has to do with my LINQ query syntax, but I am simply too inexperienced in this regard to know why it's failing. Any assistance would be greatly appreciated. If you require any additional information regarding the code or implementation, let me know and I'll try to add it to the question.
Thank you for your time.

Your VehicleQuery has the following phrase: Order By veh("VIN") Ascending.
So as soon as VehicleQuery gets evaluated (by starting the For loop), LINQ will evaluate all of the items in that query, and then perform a sorting operation, which involves comparing the veh("VIN") values with each other and putting them in order.
When comparing any two items in your query, it tries to see if either value knows how to compare itself with values of the other type (hence implementing the IComparable interface. If they cannot, then it doesn't know which one should go first.
My guess is that veh("VIN") is (sometimes) yielding objects that don't know how to compare themselves with other values returned by this expression. Depending on the kind of data you're using, and how you want it to be compared, you might consider doing some kind of cast or conversion, or simply calling ToString() on the value, to make sure it's comparable: Order By veh("VIN").ToString() Ascending
(Please pardon any syntax errors, as I'm a C# developer.)

Related

What sets limits on maximum Collection size/count in VBA?

I am using a nested Collection to store validation data from Excel, of the form: coll1(case)(subtype)(item).
My code is essentially looping through a flat input list and bin sorting the contents into collections - the top level collection is a collection of validation data for all data sources (cases), the second tier is a type collection (within each case, there are different possible types/classes of data) and the final tier is the valid list of tags/labels for things of that particular class/type.
In the code below, inp is read in from a vertical stack of Excel cells, but is essentially a list of (unique) Strings of the form "\validation_data_class\case\type\label" - hence the Split() into the labels() array to then parse.
Public tag_data As Collection
Private Sub load_tags(inp As Variant)
Dim i As Long, label() As String
Dim case_name As String, type_name As String, tag_name As String
Dim tmp_coll As Collection, tmp_coll2 As Collection
Set tag_data = New Collection
For i = LBound(inp) To UBound(inp) ' Check this works if only one entry in the list - may need IsArray() check
label = Split(inp(i, 1), "\")
Select Case label(1)
Case "tag"
' Extract the case name from the label and get its number, so we can store data in the right element of tag_data()
case_name = label(2): If Not KeyExists(tag_data, case_name) Then Set tmp_coll = New Collection: tag_data.Add tmp_coll, case_name
' Extract the type name from the label and store it, if needed
type_name = label(3): Set tmp_coll = tag_data(case_name)
If Not KeyExists(tmp_coll, type_name) Then Set tmp_coll2 = New Collection: tmp_coll.Add tmp_coll2, type_name
' Extract the actual tag and store it in the list (assumes we have ensured no duplicates already)
tag_name = label(4): Set tmp_coll = tag_data(case_name)(type_name)
Debug.Assert i < 719
tmp_coll.Add tag_name, tag_name
Case "prop"
' Still to implement
End Select
Next i
End Sub
Function KeyExists(coll As Collection, key As String) As Boolean
On Error GoTo ErrHandler
IsObject (coll.Item(key))
KeyExists = True
Exit Function
ErrHandler:
' Do nothing
End Function
The problem I am having is that it gets as far as my Debug.Assert line and then silently fails on that 719th addition to the lowest-level Collection. Weirdly it will run if I don't use keys for the lowest-level Collection, which then allows me to add the final 2 items (in this particular case, I need 721 items in that Collection, but it could be more or less in other scenarios).
I will take the workaround of not using Keys for that large Collection if I need to, but it makes my actual validation against this set of lists that bit harder later on, because I cannot just use the KeyExists method, but will have to write a slower function to crawl the un-labelled Collection looking for a match.

Is there any way I can ask queries to Notes Database

I am working on fetching meetings given two dates: e.g. fetch all the meetings that are in the current month.
Suppose that I have around 45 meetings in the specified period. My web service is taking a lot of time.
This is how I'm doing it right now:
I fetch all the documents in the calendar view.
Check all the documents for the start Date and end date.
If any of the meetings fall in the specified period i am constructing an array and i am returning that array.
Is this correct?
This way is correct, but very inefficient. Better use the NotesDatabase- Class and create a Query to use with the search- method:
Here an example in LotusScript (as you do not specify a language)
Dim ses as New NotesSession
Dim db as NotesDatabase
Dim dc as NotesDocumentCollection
Dim strQuery as String
Set db = ses.CurrentDatabase
strQuery = {Form = "Appointment" & _
(StartDate >= [01.01.2014] & StartDate < [01.02.2014]) | _
(EndDate >= [01.01.2014] & EndDate < [01.02.2014])}
Set dc = db.Search( strQuery , Nothing, 0 )
'- Cycle through this collection...
Of course you need to dynamically adjust the strQuery by building it from todays date... But this will be much more performant than your version.
It is correct, but not very performant when you have a lot of documents. Basically you will create a view with first column the meeting (start)date, sorted. In LotusScript you can acces the view, set the "cursor" of the first meeting that matches the starting date and then step thru the view until you reach a date after the end date.
Read about view´s GetDocumentByKey method. Further here: http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=%2Fcom.ibm.designer.domino.main.doc%2FH_LOCATING_DOCUMENTS_WITHIN_A_VIEW_OR_FOLDER.html
Hmmm ... thinking a litlle further, what happens if you have a start date but no matching meeting ... so refer to FTSearch() method.
If you are using Notes / Domino 9.0 or later, you should use the built-in calendar classes. These are available either from LotusScript or Java. Here's an example using Java. Given a database object and a date range, it prints all the entries in the range:
private static void printRange(Database database, DateTime start, DateTime end) throws NotesException {
// Get the calendar object from the database
NotesCalendar calendar = database.getParent().getCalendar(database);
if ( calendar != null ) {
// Get a list of calendar entries
Vector<NotesCalendarEntry> entries = calendar.getEntries(start, end);
if ( entries != null ) {
// For each entry ...
Iterator<NotesCalendarEntry> iterator = entries.iterator();
while (iterator.hasNext()) {
NotesCalendarEntry entry = iterator.next();
// Read the iCalendar representation
String icalendar = entry.read();
// Get the Notes UNID
Document doc = entry.getAsDocument();
String unid = doc.getUniversalID();
// Print UNID and iCalendar to console
System.out.println("Entry UNID: " + unid);
System.out.println(icalendar);
}
}
}
}
The NotesCalendar and NotesCalendarEntry interfaces are in the lotus.domino package. If you are using LotusScript, there are classes of the same name and with the same methods.
A couple of warnings about the above code:
It doesn't handle the nuances of repeating entries. You can have multiple instances of a repeating entry in the same time range. In Notes these entries might have the same UNID. You should test your code with repeating entries to make sure you understand the nuances.
The example doesn't recycle the Document, NotesCalendarEntry and NotesCalendar objects as it should. I skipped this for simplicity, but if you are using the Notes Java classes, you definitely need to use recycle() correctly. It will save headaches down the road.

SQL Server CE and Datetime Select issue (C#)

I have a select that uses datetime, from two datetimepickers, in the WHERE clause. The SELECT runs fine, populates a datagrid, no problem, but bizzarely the datetime part of the SELECT is completely ignored and the whole thing returns a recordset as if only "WHERE x_account_id = " + subaccountID were employed:
SqlCeConnection conn = new SqlCeConnection(Lib.ConnectionString);
string sql = "SELECT x_scaleid, x_weight, x_timestamp FROM x WHERE x_account_id = " + subaccountID
+ " AND (x_timestamp BETWEEN #start_date AND #end_date) ORDER BY x_timestamp DESC";
SqlCeCommand cmd = new SqlCeCommand(sql, conn);
cmd.Parameters.Add("#start_date", SqlDbType.DateTime, 8).Value = dFromFilter.Value.Date;
cmd.Parameters.Add("#end_date", SqlDbType.DateTime, 8).Value = dToFilter.Value.Date;
SqlCeDataAdapter da = new SqlCeDataAdapter();
da.SelectCommand = cmd;
Not been able to find anyone with the same issue online, so I'm kind of stuck. Maybe I'm better off trying to convert all the datetimes to int's, and store them that way - always hated working with datetime types.
Before anyone asks, I've tried various versions of the clause, including the use of '<' and '>', as well as different CONVERT variations.
I 'solved' the issue, and by 'solved' I mean put in a hack.
I created a new column in the DB, type int, in the x table. As I populate x_timestamp with the current time, derived from the application, I populate this new column with a number constructed out of the date, in the format of 'yyyymmdd'. So when doing any filtering, using dates, I can simply do a numerical comparison using the two dates picked (again converted to similar int formats in the application). Works a treat.
It's not an ideal solution by any stretch, as it duplicates data in the DB, but deadlines are deadlines and if no one else can suggest a better one here, this one may be of help to anyone else with a similar problem in the future.

Classic ASP - When to close recordset

I would like to know, which of the following examples is the best for closing a recordset object in my situation?
1)
This one closes the object inside the loop but opens a new object when it moves next. If there were 1000 records, this opens an object 1000 times and closes it 1000 times. This is what I would normally do:
SQL = " ... "
Set rs1 = conn.Execute(SQL)
While NOT rs1.EOF
SQL = " ... "
Set rs2 = conn.Execute(SQL)
If NOT rs2.EOF Then
Response.Write ( ... )
End If
rs2.Close : set rs2 = Nothing
rs1.MoveNext
Wend
rs1.Close : Set rs1 = Nothing
2)
This example is what I want to know about. Does saving the object closure (rs2.close) until after the loop has finished, gains or reduces performance? If there were 1000 records, this would open 1000 objects but only closes it once:
SQL = " ... "
Set rs1 = conn.Execute(SQL)
While NOT rs1.EOF
SQL = " ... "
Set rs2 = conn.Execute(SQL)
If NOT rs2.EOF Then
Response.Write ( ... )
End If
rs1.MoveNext
Wend
rs1.Close : Set rs1 = Nothing
rs2.Close : set rs2 = Nothing
I hope I've explained myself well enough and it's not too stupid.
UPDATE
To those who think my query can be modified to avoid the N+1 issues (2nd query), here it is:
This is for an online photo library. I have two tables; "photoSearch" and "photos". The first, "photoSearch", has just a few columns and contains all searchable data for the photos, such as "photoID", "headline", "caption", "people", "dateCaptured" and "keywords". It has a multi-column full-text index on (headline, caption, people, keywords). The second table, "photos", contains all of the photos data; heights, widths, copyrights, caption, ID's, dates and much more. Both have 500K+ rows and the headline and caption fields sometimes return 2000+ characters.
This is approximately how the query looks now:
(things to note: I cannot use joins with fulltext searching, hence keywords being stored in one column - in a 'de-normalized' table. Also, this kind of pseudo code as my app code is elsewhere - but it's close )
SQL = "SELECT photoID FROM photoSearch
WHERE MATCH (headline, caption, people, keywords)
AGAINST ('"&booleanSearchStr&"' IN BOOLEAN MODE)
AND dateCaptured BETWEEN '"&fromDate&"' AND '"&toDate&"' LIMIT 0,50;"
Set rs1 = conn.Execute(SQL)
While NOT rs1.EOF
SQL = "SELECT photoID, setID, eventID, locationID, headline, caption, instructions, dateCaptured, dateUploaded, status, uploaderID, thumbH, thumbW, previewH, previewW, + more FROM photos LEFT JOIN events AS e USING (eventID) LEFT JOIN location AS l USING (locationID) WHERE photoID = "&rs1.Fields("photoID")&";"
Set rs2 = conn.Execute(SQL)
If NOT rs2.EOF Then
Response.Write ( .. photo data .. )
End If
rs2.Close
rs1.MoveNext
Wend
rs1.Close
When tested, having the full-text index on its own table, "photoSearch", instead of the large table, "photos", seemed to improve speed somewhat. I didn't add the "photoSearch" table, it was already there - this is not my app. If I try joining the two tables to lose the second query, I lose my indexing all together, resulting in very long times - so I can't use joins with full-text. This just seemed to be the quickest method. If it wasn't for the full-text and joining problems, I would have combined both of these queries already.
Here is the thing. First, get your photo ids and make mysql thinks that is an actual table that hold the photo ids only, and then make your actual statement, no need any extra recordset connections...
And do not forget to start from the end to do this. Here is the sample code with explanations:
Step 1 Create photo ids lookup table and name it: This will our PhotoId Lookup Table so name it as "PhotoIds"
SELECT photoID FROM photoSearch
WHERE MATCH (headline, caption, people, keywords)
AGAINST ('"&booleanSearchStr&"' IN BOOLEAN MODE)
AND dateCaptured BETWEEN '"&fromDate&"' AND '"&toDate&"' LIMIT 0,50) AS PhotoIds
Step 2 Now we have photo ids, so get the informations from it. We will insert the above statement just before WHERE clause the same way as we do with real tables. Note that our "fake" table must be between parantheses.
SQL = "SELECT p.photoID, p.setID, p.eventID, p.locationID, p.headline, p.caption, + more FROM
photos AS p,
events AS e USING (p.eventID),
location AS l USING (p.locationID),
(SELECT photoID FROM photoSearch WHERE MATCH (headline, caption, people, keywords)
AGAINST ('"&booleanSearchStr&"' IN BOOLEAN MODE) AND dateCaptured BETWEEN
'"&fromDate&"' AND '"&toDate&"' LIMIT 0,50) AS PhotoIds
WHERE p.photoID=PhotoIds.photoID;"
Note: I just write these codes here and never tested. There may be some spelling errors or smt. Please let me know if you have troubles.
Now getting your primary question
No need to close the executed queries, especially if you are using execute method. Execute method closes itself after the execution unless its not returning any recordset data (thats the purpose of execute command at the first place) like: "INSERT", "DELETE", "UPDATE". If you didnt open a recordset object, so why try to close something never opened? Instead you can use Set Rs=Nothing to unreference the object and send to the garbage collection to free up some system resources (and thats nothing to do with mysql itself). If you are using "SELECT" queries, (the queries that will return some data) you must Open a recordset object (ADODB.Recordset) and if you opened it, you need to close it as soon as it finishes its job.
The most important thing is to close the "main connection to mysql server" after each page load. So you may consider to put your connection close algorithm (not recordset close) to an include file and insert it at the end of everypage you make the connection to the database. The long talk short: You must use Close() if you used Open()
If you show us your SQL Statements, maybe we can show you how to combine them into a single SQL statement so you only have to do one loop, otherwise, double looping like this really takes a toll on the servers performance. But before I learned Stored Procedures and Joins, I would have probably done it like this:
Set Conn = Server.CreateObject("Adodb.Connection")
Conn.Open "ConnectionString"
Set oRS = Server.CreateObject("Adodb.Recordset")
oRS.Open "SQL STATEMENT", Conn
Set oRS2 = Server.CreateObject("Adodb.Recordset")
oRS2.ActiveConnection = Conn
Do Until oRS.EOF
oRS2.Open "SQL STATEMENT"
If oRS2.EOF Then ...
oRS2.Close
oRS.Movenext
Loop
oRS.Close
Set oRS = Nothing
Set oRS2 = Nothing
Set Conn = Nothing
I tried putting this in a comment because it doesn't directly answer your original question, but it got too long.. :)
You could try using a sub-query instead of a join, nesting the outer query inside the second one. " ... where photoID in(select photoID from photoSearch ... )". Not sure if it would get better results, but it may be worth trying. That being said, the use of the full-text search does change how the queries would be optimized, so it may take more work to figure out what the appropriate indexes are (need to be). Depending on your existing performance, it may not be worth the effort.
Do you know for sure that this existing code/query is the current bottleneck? Sometimes we spend time optimizing things that we think are the bottleneck when that may not be the case... :)
One additional thought - you may want to consider some caching logic to reduce the amount of redundant queries you may be making - either at the page level or at the level of this method. The search parameters could be concatenated together to form the key for storing the data in a cache of some sort. Of course you would need to handle appropriate cache invalidation/expiry logic. I've seen systems speed up 100x with very simple caching logic added to bottlenecks like this.
It's simple ask about the state of your RecordSet is 1 or 0 , It means open or close
like this
If RS.State = 1 Then RS.Close
the connection to database (CN) will still up but you can reopen the RS (RecordSet) again with any values

subsonic collection

I've written this code to generate a collection. I've tried to filter the collection using subsonic.where but its not working. Actually the where clause will change on user input so i cannot add the where clause to the sqlquery and also the datatable will be filled with different data from the collection based on the user input. How can I acheive this. Also i want the collection to be unchanged so that i use it further to filter with another where clause. Alo the I've selected only two columns but all columns are showing up. Please help.
Dim sq As SB.SqlQuery = New SB.Select("product.prodcode as 'Product Code'").From(DB.Product.Schema)
Dim wh As SB.Where = New SB.Where()
Dim prod As DB.ProductCollection = sq.ExecuteAsCollection(Of DB.ProductCollection)()
wh.ColumnName = DB.Product.ServiceColumn.PropertyName
wh.Comparison = SubSonic.Comparison.NotEquals
wh.ParameterValue = System.Decimal.One
Dim tab As DataTable = prod.Where(wh).Filter().ToDataTable()
Me.GridControl1.DataSource = tab
What you're doing doesn't make much sense - the where needs to go onto the query, then hit the DB - that's the way it should work. If you want to filter after the fact you can use Linq's Where(), which will filter the list for you.

Resources