Pros and cons of using an Excel file as a database [closed]

Pros and cons of using an Excel file as a database [closed] - excel

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm looking for detailed answers to the question : What are the pros and cons of using an Excel file as a database ?

One of the pros seems to be that users are familiar with Excel and can work with the tables without needing to know about databases. There are however many reasons not to use Excel as a database.
- Even though you can do some validation in Excel it is no match to any good database program.
- When importing data from an Excel file to, for instance, a SQL database you often run into problems because of the misinterpretation of the valuetypes
- Also when importing dates the interpretation may fail
- Strings like 000234 will most likely be read as numbers and end up as 234
- As stated before the sharing of the database is very limited
- But one of my main concerns using Excel as a database is the fact that it is a single file that can easily be copied to various locations which may cause you to end up with several versions of it with different data

Cons: size/performance, sharing
Pro: none
P.S. If VBA is an issue, why not Access?

I wouldn't really suggest that Excel is or can properly act like database - as it lacks the features, data protection and security to act as such.
If the reason to use this is based upon ease of use and end user familiarity - it is quite easy to connect Excel as a front end to a database - using it as a reading and writing device, whilst taking advantage of the speed and stability issues of a 'true' database.
Pros:
Very familiar
VBA is easy to use to create fairly simple to use sheets
Lots of functions to manipulate data
Cons:
Slow and VERY clunky with large data set
Hard to validate on imported data
Prone to crashing with large datasets
Lack ability to use intelligent queries or views
Many more..

Related

Using Databricks to convert Excel to a standard format [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm trying to implement a process using Data Factory and Databricks to ingest data into Data Lake and convert it all to a standard format i.e. parquet. So we'll have a raw data tier and a clean/standardized data tier.
When the source system is a DB or delimited files its (relatively) easy, but in some cases we will have excel sources. I've been testing the conversion process with com.crealytics.spark.excel which is ok because we can infer the schema BUT its not able to iterate through multiple sheets OR get the list of sheet names to enable me to iterate thought each one to convert into a single file.
I need this to be as dynamic as possible so that we can ingest almost any file regardless or its type or schema.
Does anyone know of any alternative methods of doing this? I'm open to moving away from databricks if necessary, such as Azure Batch with a custom C# script.
thanks in advance!

Since you are aiming to store the data in Azure Data Lake, another approach may be to use Azure Data Lake Analytics with a custom Excel extractor. U-SQL then can convert it into Parquet. See here for a sample Excel extractor.
How much variability do you expect with the Excel sheets?
The main problem here will be that it is hard to be completely schema agnostic, especially if you have many columns. To handle variability of the schema, you could change the extractor to output the columns either as key/value pairs or - if the number of columns and size of a row is reasonable - as a SqlMap (or a few for different target types). Although you would have to probably pivot it into a column format before creating the Parquet which would either require a second script to create the pivoting script or some custom outputter (instead of the built-in Parquet outputter).

Storing big text data in database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am trying to build a blogging site(sort of). The users can write big blogs(or text) and also have facility for customisation like fonts, size, colour of text etc (kind of like posts in stack overflow n little more). I am looking to use mongo-db or couch-base for the database part. Now I am confused in few things
Where should I store the blogs or posts? In database or in text files? If in database how will I store the fonts, size, colour(user can have different fonts, sizes for different part of posts)?? The posts can sometimes be very big, so is it advisable to store such large texts in database. I see the easier option to store them as files(text files) but I am worried about performance of the site as loading text files can be slow in websites. Just for a knowledge sake, How does google store google docs files??
Should I use any other database which is more suited to handling the kind of things I mentioned?
Though Full search of texts in the post is not a feature I am looking into right now, but might afterwards. So take that also for a small consideration for your answer.
Please help me.

Honestly MongoDB has been the best database for our NodeJS projects. Before it had a 4MB maximum BSON document size, however it was increased to 8 MB and now to 16 MB with the latest versions. This is actually a fair amount of text. According to my calculation you should be able to store 2097152 characters in a 16MB object (though that includes the overhead)
Be aware that you are able to split up text into separate BSON documents very easily using GridFS.
I saw you were entertaining the idea of using flat files. While this may be easy and fast, you will have a hard time indexing the text for later use. MongoDB has the ability to index all your text and implementing search will be a fairly easy feature to add.
MongoDB is pretty fast and I have no doubt it will be the fastest database solution for you. Development in NodeJS + MongoDB has taken months off projects for my firm compared to SQL based databases. Also I have seen some pretty impressive performance reviews for it. Keep in mind as well that those performance reviews were last year and I have seen even more impressive reviews but that was what I could find easily today.

Security concerns of using mongodb [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I come from mysql background, and I am aware of typical security concerns when using mysql.
Now, I am using mongodb (java driver).
What are the security concerns, and what are possible ways of avoiding security problems?
Specifically these areas:
1) Do I need to do anything for each get/post?
2) I store cookies from my application on client side and read those later (currently the only information I store is user's location, no sensitive information), Anything I should be careful about?
3) I have text boxes, text areas in my forms which users submit. Do I need to check for anything before saving data in mongo?
Can anybody provide any instances of security problems with existing applications in production?

It is in fact possible to perform injections with Mongo. My experience with it is in Ruby, but consider the following:
Request: /foo?id=1234
id = query_param["id"]
collection.find({_id: id})
# collection.find({_id: 1234})
Seems innocuous enough, right? Depending on your HTTP library, though, you may end up parsing certain query strings as data structures:
Request: /foo?id[$gt]=0
# query_param["id"] => {"$gt": 0}
collection.find({_id: id})
# collection.find({_id: {"$gt": 0}})
This is likely less of a danger in strongly typed languages, but it's still a concern to watch out for.
The typical rememdy here is to ensure that you always cast your inbound parameter data to the type you expect it to be, and fail hard when you mismatch types. This applies to cookie data, as well as any other data from untrusted sources; aggressive casting will prevent a clever user from modifying your query by passing in operator hashes in stead of a value.
The MongoDB documentation similarly says:
Field names in MongoDB’s query language have semantic meaning. The dollar sign (i.e $) is a reserved character used to represent operators (i.e. $inc.) Thus, you should ensure that your application’s users cannot inject operators into their inputs.
You might also get some value out of this answer.

Regarding programming:
When you come from a mysql background, you are surely thinking about SQL Injections and wonder if there is something like that for MongoDB.
When you make the same mistake of generating commands as strings and then sending them to the database by using db.command(String), you will have the same security problems. But no MongoDB tutorial I have ever read even mentions this method.
When you follow the usually taught practice of building DBObjects and passing them to the appropriate methods like collection.find and collection.update, it's the same as using parameterized queries in mysql and thus protects you from most injection attempts.
Regarding configuration:
You need, of course, make sure that the database itself is configured properly to not allow unauthorized access. Note that the out-of-the-box configuration of MongoDB is usually not safe, because it allows non-authorized access from anywhere. Either enable authentication, or make sure that your network firewalls are configured to only allow access to the mongodb port from within the network. But this is a topic for dba.stackexchange.com

Which is the better approach to manipulating record-based data, Arrays or Ranges? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am currently building a macro to allow me to import an Excel spreadsheet and format the data into a report-like structure.
Thus far I've been able to accomplish this manually using formulas to aggregate the data, but I would like to automate many of these steps.
So, I feel like I have a decision to make.
Since I am importing another Excel file into my Macro-enabled Workbook, do I work with the data by referencing ranges, or do I dump the contents of the file into a Variant Array/Collection/Dict?
I imagine that other people would want to use this "report builder" eventually, so I'm trying to make the conversion as seamless as possible.

Unless you're handling large amounts of data and speed is of vital importance, I would strongly recommend to use Ranges as the main object. The main advantages are:
Very flexible in addressing ranges, e.g. Range("A1:B10"), Range(TopCell, BottomCell), Range("NamedRange")
.Resize(Rows, Cols) and ´.Offset(Rows, Cols)` for flexible referencing
You can basically modify all properties of the range/cells, i.e. not only the .Value, but also .Formula, .Font, .Border, ...
Plenty of methods allow advanced handling (e.g. .SpecialCells to find formulas/values/..., .Rows.Visible to hide, etc.)
Loop over all cells in a range with a simple For Each c in Range("YourRange").Cells loop
If want to modify anything apart from the value itself, you need to use Ranges. Only if you want to do some complex calculation on a larger dataset, you can improve performance by reading the content of your ranges (or whatever datasource) into an array, process it in the array/in memory and then write the result back into a range in one rush. This approach will significantly increase the calculation speed - but is only applicable in a few situations and recommendable if performance really matters.
Regarding Collections/Dictionaries:Don't compare them to ranges - they are useful vehicles for certain tasks, esp. when you want to loop over a set of members. But Ranges themselves are a kind of Collection, i.e. you can loop over its elements (just not delete/add new members).
If you want to compare the use of Collections and Dictionary class, here's my quick summary:
Collections are build into VBA, so no need for referencing. The are my main choice for simple aggregations that I want to either loop or address. The advantage of Dictionaries is that they offer additional functionality, mainly a .Exists property - and a .Keys collection, allowing you to also loop over the keys of the hash. Thus, I reference Dictionary (in the Microsoft Scripting Runtime) whenever I need those features, but stick to Collections else.

There are lots of report building tools out there. Making your own seems like wasted effort unless I don't understand the problem.

Ideas for full text search MongoDB & node.js [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am developing a search engine for my website and i want to add following features to it;
Full text search
Did you mean feature
Data store in MongoDB
I want to make a restful backend. I will be add data to mongodb manually and it will be indexed (which one i can prefer? Mongodb indexing or some other search indexing libraries like Lucene). I also want to use node.js. These are what i found from my researches. Any idea would be appreciated for the architecture
Thanks in advance

I'm using Node.js / MongoDB / Elasticsearch (based on Lucene). It's an excellent combination. The flow is stunning as well, since all 3 packages (can) deal with JSON as their native format, so no need for transforming DTO's etc.
Have a look:
http://www.elasticsearch.org/

I personally use Sphinx and MongoDb, it is a great pair and I have no problems with it.
I back MongoDB onto a MySQL instance which Sphinx just quickly indexes. Since you should never need to actively index _id, since I have no idea who is gonna know the _id of one of your objects to search for, you can just stash it in MySQL as a string field and it will work just fine.
When I pull the results back out of Sphinx all I do is convert to (in PHP) a new MongoId, or in your case a ObjectId and then simply query on this object id for the rest of the data. It couldn't be simpler, no problems, no hassle, no nothing. And I can spin off the load of reindexing delta indexes off to my MySQL instance keeping my MongoDB instance dealing with what it needs to: serving up tasty data for the user.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string