Good place to start learning data warehousing? [closed] - resources

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am interested in learning more about data warehousing. I see terms like "dimension", "snowflake schema" and "star schema" thrown about. Where would one start in learning about this stuff? Are there good books or Internet resources?
ETL is in this space too right?

Wikipedia's resources on Data Warehousing are good.
Reading any of Ralph Kimball's books, such as "The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling "
Yes, ETL is in this space.
You may also be interested in Column oriented databases.
Vertica have a blog with a few posts regarding how they're often better for what data warehouses are used for. For example "Reflections on the Kimball Data Warehouse "Bible": Time for a New Testament?" and "The Truth About MPP & Data Warehousing"

You can try this, Data Warehousing Tutorials

Related

Compact key value store in Rust [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I'm working on a Rust project that collects daily statistics for a web-site (number of requests, number of unique users, average latency etc.). I'd like to store this data in a compact key-value store where the key is a date (or a date string) and the value is an object that contain the statistics. I also need this data to be persisted to a file.
I don't have any special performance or storage requirements. That's why I don't want to use major DBs like Redis, MongoDB or Cassandra that require a separate installation and significant resources to run. I'd like something much simpler and lightweight.
The ideal solution for me would be a library that can read and write key-value data and persist it into a file. The data size I'm aiming for is around 1000-2000 records.
Can you recommend a library I can use?
I can recommend PickleDB-rs. I think it answers most of your requirements. PickleDB-rs is a Rust version of Python's PickleDB. It's intended for small DBs (I think 1000-2000 records should be ok) and the performance isn't guaranteed to be as great as large scale DBs, but for the purpose of dumping daily web-site stats into a file it should be sufficient.

Is there any front end project used to view apache spark result? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am using Spark SQL to do some analysis.
I`m wondering is there any Front end projects can be used to view the result? I mean the analysis result not the job successful / faile status
For example, granafa, kibana, etc..
Regards
Mingwei
If you mean visualization of your results (like the ones you've mentioned) you might be interested in Apache Zeppellin. It's more like IPython Notebook so you can write your code there and visualize results.
Otherwise you'd have to tell us what is your storage format and where are you storing your results - maybe there are some visualization tools for it.
Actually if you store your results of Spark jobs in ElasticSearch you can use Kibana with it.
Otherwise, I don't think there is anything. The difference between what you are referring to (openTSDB and Elasticsearch) and Spark is that the latter is not a datastore.

Examples of planning and search usage [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
What are applications where search techniques or more specifically planning techniques are used? I am most interested in examples in use.
I know that A* is used for path planning in Robotics, that planning is used in logistics (details would be great) but what other usages are there?
For Search in general Google, etc come to mind with their inverted indices. Again, where else is it used?
For planning examples, including logistics challenges, take a look at this list. Each use case comes with multiple datasets and a problem definition.

How to operate the transaction of cassandra? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
In my project, I use the spring, but the cassandra cannot support transaction. How to operate the transaction of cassandra in service layer?
You can log every transaction you carry out, store them in a log file of some sort and when you want to undo it create a query that does the opposite of what you just did.
You need to think differently in noSQL. Read Building on Quicksand http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_133.pdf. If using cassandra, you may want to check out PlayOrm as well and the nosql patterns page.

Is this the correct definition of a "corpus"? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I have a huge string of raw text that is about 200,000 words long. It's a book.
I want to use these words to analyze the word relationships, so that I can apply those relationships to other applications.
Is this called a "corpus"?
A corpus, in linguistics, is any coherent body of real-life(*) text or speech being studied. So yes, a book is a corpus. The fact that it's in one string doesn't matter, as long as you don't randomly shuffle the characters.
(*) As opposed to a bunch of made up phrases being shown to test subjects to measure their responses, as is commonly done in psycholinguistics.
Yes.
http://en.wikipedia.org/wiki/Text_corpus
Specifically, because it's uses for statistics.
Usually "corpus" is used to refer to a structured collection, but linguists would know what you're talking about.

Resources