Python 3 : Storing Data without loading it into memory - python-3.x

I am currently building a flask app that will have lots of data that I think I cannot load into memory. I have searched many places, and have found the SQL seems to be a good solution. Sadly I cannot use SQL for this project due to some limitations of SQL.
My project consists many entries of
database[lookupkey1][lookupkey2]...and more lookup keys
My current plan is to override __getitem__ and __setitem__ and __delitem__ and replace them with calls to the database. Is their any kind of database that can store large amounts of maps/dictionaries like
{"spam":{"bar":["foo","test"],"foo":["bar"]}}
I am also currently using JSON to save data, so it would be appreciated if the database had a easy way to migrate my current database.
Sorry that I'm not very good at writing stack overflow questions.

Most document-oriented DBs like MongoDB would allow you to save data as nested dict-list-like objects and query them using their keys and indexes.
P.S. Accessing such a DB through a Python's dict accessors is a bad idea as it would produce a redundant DB query for each step which is highly ineffective and may lead to performance problems. Try looking at ORM for a DB you choose as most ORMs would allow you to access document-oriented DB's data in a way similar to accessing dicts and lists.

Related

How to backup up a Sqlalchmey database?

I am trying to backup a database through sqlachemy and save it as a file. I tried using the extension, Flask-AlchemyDumps, but it appears to no longer be supported.
I musted be missing something obvious as this is surly an action a lot of developers want to do. Does anyone know how I should be backing up the database?
Thanks in advance
J Kirkman
SQLAlchemy is an ORM which sits between your code and the database. It's useful if you want to interact with specific rows and relationships without having to keep track of lots of ids and joins.
What you're looking for is a way to dump the entire contents of your DB to disk, presumably so you can restore it later/elsewhere. This is a bulk action, which is your first clue that an ORM may not be a suitable tool. (ORMs tend to be fast enough for small to medium operations, but slow and not ideal for actions which affect 10s of 1000s of rows at once.) And indeed, this isn't usually something you'd use an ORM for, it's a feature of your DB, presumably Postgres or MySQL. If you happen to be using Heroku, you can use their command line tool to do this.

SQL views in Yesod/persistent

In http://www.yesodweb.com/book/persistent there is no mention of SQL views.
I (even in imperative languages) have been very fond of immutable database schema design. i.e. only INSERTs and SELECTs - UPDATEs and DELETEs are not used.
This has the advantage of preserving all history, at the expense of making the current 'state' a relatively expensive pure function of the history in the DB.
e.g. there is not a 'user' table, just 'user_created', 'user_password_updated' and 'user_deleted' tables, which are unified in a 'user' SQL VIEW, showing the current state of users.
How should I work with VIEWs in Persistent? Should I use Persistent at all - is it (ironically for Haskell) too tightly focused on a mutable DB for my use-case?
It's been a long time since the question was asked, but I thought it was worth
responding because seven years later the answer has not changed and I really like
your idea about keeping the old versions of tables around and reading them with
views! One drawback of this approach is that using Template Haskell in persistent
will slow down compile times a lot. Once I had a database of about 50 tables in
persistent Template Haskell and it took over half an hour to compile if it was
ever changed.
Yesod's persistent does not support
SQL views and I doubt it ever will because it intends to be database agnostic.
Currently it looks like it supports CouchDB, MongoDB, MySQL , PostgreSQL, Redis
and SQLite. Not all of these databases support SQL style views so it would be
hard to abstract over all of them.
Where persistent excels at is at providing an easy way to create a set of
Haskell types that serialize to and
from different databases. It provides with type class instances and functions to
do single table queries and these work really well. If you want to do join queries on an SQL database that
you are interfacing with persistent, then you can use esqueleto,
a type safe EDSL for SQL join queries.
As far as handling SQL Views in Haskell, I have not come across any tool yet.
You can either use rawQuery which will work but be harder to maintain or
you can build your own tool around one the Haskell DB interfaces like postgresql-simple,
which is what persistent does. In fact, you can even start with the persistent
source code of whatever database you are using and build an SQL View EDSL as you
need. In a closed-source project I helped build a custom PostgreSQL
interface based on some of persistent's ideas and types, but without using
an Template Haskell because the compile time was too slow.

Querying with Redis?

I've been learning Node.js so I decided to make a simple ad network, but I can't seem to decide on a database to use. I've been messing around with Redis but I can't seem to find a way to query the database by specific criteria, instead I can only get the value of a key or a list or set inside a key.
Am I missing something, or should I be using a more robust database like MongoDB?
I would recommend to read this tutorial about Redis in order to understand its concepts and data types. I also had problems to understand why there is no querying support similar to other (no) SQL databases until I read few articles and try to test and compare Redis with other solutions. Maybe it isn't the right database for your use case, although it is very fast and supports advanced data structures, but lacks querying which is crucial for you. If you are looking for a database which allows you to query your data then you should try mongodb or maybe riak.
Redis is often referred to as a data
structure server since keys can
contain strings, hashes, lists, sets
and sorted sets.
If able(easy to implement) you should use these primitives(strings,hashes,lists,set and sorted sets). The main advantage of Redis is that is lightning fast, but that it is rather primitive key-value store(redis is a little bit more advanced). This also means that it can not be queried like for example SQL.
It would probably be easier to use a more advanced store, like for example Mongodb, which is a document-oriented database. The trade-off you make in this case is PERFORMANCE, but I believe you should only tackle that if that is becoming a problem, which it probably will not be because Mongodb is also pretty fast and has the advantage that it can be queried. I think it would be advisable to have proper indexes for your queries(read>write) to make it fast.
I think that the main answer comes from the data structure. Check this article about NoSQL Data Modelling, for me it was very helpful: NoSql Data Modelling.
A second good article ever about Data Modeling, and making a comparison between SQL and NoSQL is the following: The Relational model anti pattern.

A better option than to store db model in a txt file in a php shared hosting environment

This is more of a conceptual question rather than a programming question.
I have developed a system where I use a DB layer which is responsible for generating queries as well as running them.
To avoid creating queries which can't run I have a simplified database model over every table with all respective columns of the persistance layer. In each record I provide the name of a table and for each column in the table I provide name, type and length. This way I can catch bad naming problems but also invalid inputs.
The model has no knowledge of data stored in the tables.
The model is stored in a txt file which exists in the filesystem of the server. I am concerned with the security of that solution as typing in the url for the db_model txt file would expose the entire persistence data model of the application.
How can I do a better job with this?
I'm thinking about a few options.
encrypt the txt file and then for each session, decrypt and store as a session variable as I need the model for each pageload, even several times on most pageloads.
Moving it up in hierarchy of the filesystem above the root of the webserver and read it through ftp connection. It would look bad when packaging the system as a product though so I don't think that option is viable.
Are any of these options a good idea or should I do something completeley different?
best regards
Rythmic
Simple answer:
Don't keep track of it your self. Your RDBMS (which one are you using, btw?) will have an internal mechanism to keep track of this. It also has its own mechanisms for ensuring that the queries you pass to it are acceptable. That's why we pay it the big bucks - let it do its job the way it's trained to.
Relying on the RDBMS is definitely an option to consider strongly - another option is to query the DB itself if you feel the need to validate input - ie either store the data in your text file in the db itself and read it through non-parameterised queries or even better, read the DB schema directly from your DB system which will guarantee the version you're checking input against exactly matches your DB schema

Why Document DB ( like mongodb and couchdb ) are better for large amount of data?

I am very newbie to this world of document db.
So... why this db are better than RDBMS ( like mysql or postgresql ) for very large amount of data ?
She have implement good indexing to carry this types of file, and this is designed for. This solution is better for Document Database, because is for it. Normal database is not designed to saving "documents", in this option you must hard work to search over your documents data, because each can be in other format this is a lot of work. If you choice document db solution you have all-in-one implemented because this database is for only "docuemnts", because this have implementation of these needed for it functions.
You want to distribute your data over multiple machines when you have a lot of data. That means that joins become really slow because joining between data on different machines means a lot of data communication between those machines.
You can store data in a mongodb/couchdb document in a hierarchical way so there is less need for joins.
But is is dependent on you use case(s). I think that relational databases do a better job when it comes to reporting.
MongoDB and CouchDB don't support transactions. Do you or your customers need transactions?
What do you want to do? Analyzing a lot of data (business intelligence/reporting) or a lot of small modifications per second "HVSP (High Volume Simple Processing)"?

Resources