Why not use a relational database for searching on every column? - search

Similar to this question, but not the same thing....
When shouldn't you use a relational database?
If I have let's say 20 columns of data and it's not going to change in size, the only way to quickly search on all of them, is to index every column. This takes up a lot of space, and causes inserts and updates to take a long time.
But if the alternative is to use some kind of text-indexing-and-searching engine that does basically the same thing with a more proprietary format, why not use a relational database?
If your text search index has to be modified every time you add or update any of the 20 data items, how is this any different from updating the equivalent index in a database?

Related

How better to design MongoDB for multi time serias data

I have a time series data, for example myData, and display it to my ui, it could be Day, Week, Month, Year.
How it's better to store in MongoDB, should I create separate Collections for this like:
myDataDay
myDataWeek
...
or it's better to store it in one Collections with Day, Week, Month, Year keys?
How could it impact the performance?
You will need to answer following questions:
Number and type of paralel queries you send to the database?
Are there other fields that data will be searched on?
Are the 90% of all queries in the range of last year/month/date/hour or other?
If you split the data between many collections logic on app side will become more complex , from the other hand if you keep everything in same collection at some point in time your database will become bigger and more difficult to mantain...
You may take a look to the special collection types dedicated to time series data , but in general it very depends on the amount of data and and distribution you expect ...

How to know the insertion/update history of one Cassandra table?

There is one table (or called column family) in Cassandra. I want to know how many records of this table were inserted or updated since a given timestamp. How to do it?
Your best option is to try writetime(column_name). That way you will get the write times of particular columns. You won't get, however, write times of already deleted columns. It's far from what you want, but that's the only possibility.

Cassandra sets or composite columns

I am storing account information in Cassandra. Each account has lists of data associated with it. For example, an account may have a list of friends and a list of liked books. Queries on accounts will always want all friends or all liked books or all of both. No filtering or searching is needed on either. The list of friends and books can grow and shrink.
Is it better to use a set column type or composite columns for this scenario?
I would suggest you not to use sets if
You are concerned about disk space(as each value is allocated a cell in disk + data space for metadata of each cell which is 15 bytes if am not wrong. Now that consumes a lot if your data is a growing one).
Not going to grow a lot of data in that particular row as each time ,the cells are to be fetched from different sstable .
In these kind of cases, the more preferred option would be a json array. You shall store it as a text and back the data from that.
Set (or any other collections ) use case was brought in for a completely different perspective. If you are having a particular value inside the list or a value has to be updated frequently inside the same collection, you shall make use of the collections .
My take on your query will be this.
Store all account specific info in a json object of friends that has a value as list of books .
Sets are good for smaller collections of data, if you expect your friends / liked books lists to grow constantly and get large (there isn't a golden number here) it would be better to go with composite columns as that model scales out better than collections and allows for straight up querying compared to requiring secondary indexes on collections.

What is the best way to filter a large list in Excel?

I have a table in Excel that I want to filter. It will have a maximum of 1 million rows and 80 columns. All the calculations etc are done programatically in arrays to cut dwn processing time. However, I want to also filter the results to display only certain results based on one column value, followed by a top 5% based on another filter value.
When I first did the sheet, it was limited to 65000 results so there were no problems with the size of the data set. I just invoked the worksheet filter functions from code and did it that way. Can I do it that way with a larger data set or is there a way to filter an array the way you d a dataset on a sheet?
Thanks
As already mentioned by everyone, excel 2007 will take you to a million rows, but its slower than the excel 2003 that I presume you're using at the moment so filtering using it wouldn't be advisable.
Along with mysql, ms access is also an option.
You really should put that data in an Access table and use Excel's Database Query to do the job. Since it can also filter retrieved data based on a cell value, it's a great combination.
Storing the data in a database brings you another interesting option (depending on what you want to do): to query your database using PowerPivot.
Although using a relational DB would be preferable in many ways, if you don't have any formulas then filtering your data (1 million rows by 80 columns) using Excel will be reasonably fast (< 1 or 2 seconds depending on what sort of filtering you want to do, which will probably be faster than an un-indexed DB table) assuming that you have enough RAM. If you do have any formulas then you will probably need to be in Manual calculation mode to avoid the filtering process triggering multiple recalculations.

How can I delete records from a table that have certain criteria

Rookie question I know.
I have a table with about 10 fields, one of the fields is a category field. I need this field to exist because of the multiple types of categories. However, one category in this field is wrong and is duplicating results.
So can I delete all records in the table that have "Type320" in the CatDescription field, and how? I want to keep eveerything else as it is in this table; just need to get rid of the records that have that that in that one field
Thanks very much!
EDIT: Thanks for the answer, I did not know how to do this so this is very helpful
However, this is more complicated than I thought. The raw data that I am supplied carries these duplicate records (only duplicate in certain circumstances but they are easy to isolate). This raw data is given to me on a monthly basis in several spreadsheet forms.
It all relates to these ID numbers, and has like 10 fields (xls columns). As I said before one of these is the Category Description field (sorry, this is not a lookup) In certain places this records automatically duplicates itself on output because in the database this comes from, it has to have this sub category for one particular "type"
So....every time there is a duplication, every single bit of information in all fields are exactly the same, with the exception of this CatDescription (one is Type320, and the duplicated record type is "Type321"). However, there are some instances where Type321 is valid on it's own (in which case there is no matching data row with a Type320 catdescription). By matching I mean all data in all fields of a particular record.
A very clear absolute of this is if all fields (data within) of a record with Type320 CatDescription, matches all fields (data within) a record with Type321 CatDescription, then I can delete that record containing Type321 CatDescription. This is true because this is the only situation where this duplication occurs, normally not all of this should match.
This allows all unique records with Type320 and Type321 data (that does not match exactly) to stay; just a it should. This makes sense to me (and hopefully you too :/) but can it be done, and how?
thanks because this is way over my head. I would rather know how to do it in access, but an xls solution is equally as appreciate. heck i would do it in ppt if it would get the job done! :)
I would try with one of these two querys:
DELETE FROM table WHERE CatDescription LIKE '%Type320%';
DELETE FROM table WHERE CatDescription LIKE '*Type320*';
That because the Access database engine could be using * (ANSI-89 Query Mode e.g. DAO) instead of % (ANSI-92 Query Mode e.g. OLE DB/ADO) for the wildcards.
Alternatively, this regardless of ANSI Query Mode:
DELETE FROM table WHERE CatDescription ALIKE '%Type320%';
Note the Access database engine's ALIKE keyword is not officially supported.
Does the CatDescription field look to another table? Is it a a query of those tables that creates what you call duplicate results?
If so, be careful about blaming the table that has CatDescription. Check the look-up table to see if Type320 is found there in duplicate.
If you don't have the problem isolated correctly, then you're likely to delete good records while not fixing the problem.

Resources