I am relatively new to columnar database, please forgive ignorance. Lets say I have 1,000,000 columns. I would like to return a random sample of 10% of those columns (ie c0, c10, c20...c999,980, c999,990)
In HBase they have column filters, I could write a column filter that returned every tenth result. Can I do this in Pycassa/Cassanda?
Thank you
The only thing you can do server side is slices. So you can read starting at column=C10 limit=10 to get columns 10-19. Or you can ask for specific columns, so you could ask for every 10th column manually if you knew how many columns there were.
You could do this easily client-side with Pycassa, but Cassandra does not support server-side filtering.
Related
Looking to use VBA to compare two tables, with three columns each against each other. Beginner here and very lost.
They may have a different amount of entries each, and there may be some in table A that aren't in table B, and vice versa
Some of the individual Columns may match but trying to work out how to make sure all three columns are compared as one against all three columns in the other table
For example
xyz123 55.50 12/07/21 if compared with XYZ123 54.55 12/07/21 will show up as not a match, because the middle column is a different number.
Have attached a picture below. For the most part, and unlike the photo, each table will be in a completely random order, and its unlikely that there will be the same entry in table 1, row 1, as table 2 row 1
Ideally, I'm trying to create two new table to the right of the original tables, the first one being the entries table 1 has, that table 2 does not have. The second one being the entries table 2 has, that table 1 does not have.
Have attached an example below of the end result I'm looking for out of this. The four rows on the left are entries that the first table has but the second table doesn't, and the rows to the right are all entries that the second table has, but the first table does not.
I've tried to search on this but haven't found something that matches what I've got, and I'm struggling to adapt someone else's code to my specific problem
Any help on this would be greatly appreciated
Maybe not a direct answer to your problem but is this data also in a database somewhere or are you familiar with Ms Access? As you could open the tables in Access, and it is pretty easy to do this kind of thing with data bases.
If not, then yes, it is do able with VBA. Numerous ways of doing it.
The simplest is to scroll through one table a line at a time and compare it with every row in the other table and match or not. This will work with small tables and be easy and quick but for large data tables it would be wasteful and may take a long time to complete.
I've used Kahoot in the classroom and have several excel files with scores from quizzes.
Students attended quizzes by using unique IDs. In each file, scores are visible for each ID (but ordered by success on each quiz). There are also some students missing or stating wrong IDs (I'll ignore it).
Now I would like to accumulate all scores for all student IDs in one sheet and summarize them by Student ID.
How can I do that most efficiently?
Any pointer or advice is appreciated.
Thanks,
B.
Here's a high level guide to getting what you want along with a sample in this file.
Step 1 - Combine Files to Sheet with Unified Columns
Objective
The goal here is to:
Combine all of your data from other files to single sheet
Merge the data to be in a single column for each field (i.e. Column A has ID, Column B has score).
No breaks in rows.
No formulas.
To illustrate, I made this fake list based loosely on your
description.
Method
You probably can do this manually, but a macro could also be used. If you expect to do this year over year, you might look into vba to open close files in a folder. However, since that wasn't part of question, you can do copy-paste (better yet make a kid do it!). Just make sure there's only one header for each column, and all of the data records align. Probably should do copy paste value if you have any formulas.
Step 2 - Show Summation
There's a couple ways this could be done. A pivot table is probably the most sensible because you could include each quiz as a column to see the total. You could also use a pivot table to do averages by student etc.
TO make a pivot table, I would recommend going on YouTube and they will do a better job of explaining than me.
On that same file I made as an example, I included some tabs to illustrate the power of pivot tables and a couple graphs.
Hope that helps. If you have specific technical questions on this, you might consider asking separately.
Is there a way to add values in Excel based off of values previously in table?
For example, in the table I currently have, is there a way to exclude adding the 1 from the "Attended" column in the "Sonics and Cold Cash" row because I already had a row with "Sonics" and "1" in attended? I don't want to add a 1 to the SUMIF function if I have already attended that team once before.
I hope this is clear enough for some help. Thank you!
edit: So far, I have a table that tracks how many times a team has been "attended". This works, however I am trying to use linear optimization for scheduling, and using the results table has some linearity problems. I'm trying to find a way to only use the table instead of a second, results table.
I'm looking for help dynamically averaging the column values of every item in an Excel table that has a given value in one of its columns. Specifically:
I have an Excel sheet where each row represents an entity in a video game I am working on, and each column is a numerical value for different attributes on these entities. Movement Speed, Health, Attack Damage, etc. Each of these rows also has a column where I tag the row with the name of the class that this entity is a part of: "tank", "support", etc. This table has roughly a hundred items in it, and is likely to grow to two or three times that size.
It looks something like this:
What I would really like to do is have, on a separate tab, a table where each row represents one of the classes, and shows the average value of all of the entities that have that class in their "group" column. And I want it to automatically include new entities of that class as they are added to the first table.
It would look something like this, where these values are automatically generated from the data in the first table (I have no problem manually entering the class names, I just need the numerical data to be driven):
I imagine that the solution will be a complex, nested pile of VLOOKUPs and MATCHes and other Excel functions, but I am not really sure how to accomplish this. I didn't even know the proper terminology to search for existing answers to this question, so I hope that it isn't too redundant. Thanks very much for any advice you have!
Version: I am using Excel 2013.
I think all you need is a pivot table. (its been around since the 90s?) - and very useful!
there are lots of ways of refreshing etc depending on where the data comes from
http://office.microsoft.com/en-us/excel-help/pivottable-reports-101-HA001034632.aspx
As we cannot sort data in Cassandra, I wanted to store data in such format that when I retrieve the data, I need to get data in ' last in first out format ' i.e if user enter comments when I retrieve data, I should first get very latest comment first and then older comments. I think it's something to do with comparator.
I have set following when configuring Cassandra:
assume posts comparator as utf8;
assume posts validator as utf8;
assume posts keys as utf8;
Please help - how should I create the column to arrange data in time format so that latest data is stored first?
Columns in a row are always sorted, and you can iterate over the columns in a row in reverse order. Given these two facs we could model the situation you're describing by storing comments in a column family called "comments" where the row key is the post ID, and the columns represent the comments to the corresponding post. The columns are timestamts (either ISO formatted dates, UNIX timestamps or time UUIDs) and the values are the comment text bodies.
If you would now get the columns for a row and specify that you wanted them in reverse order you would get what you want. How to specify reverse order depends on your driver, but it's usually just an option to the command that retrieves a row, or a column slice.
Another way, which is more hackish, would be to take the UNIX timestamp of a post, and subtract it from a large integer, like 2^31, and use that as column key. That way columns would sort in reverse order by default. It's not pretty and the above method is more elegant.
If you worry about using timestamps because there could be collisions where two comments are posted at exactly the same time, use Cassandra's time UUID type.
You need to organize your data such that the comparator is a timestamp. You store your data in natural order and specify reverse order in your slice query.