Cassandra comments data model - cassandra

I am trying to store very simple comments in a wide row, but the problem is that i want to have top comments.
So at first I have tried to use UTF8 comparator type and each column name would begin by likes amount and would be followed by timestamp, for example:
Comments_CF = {
parent:{
8_timestamp: comment,
5_timestamp: comment,
1_timestamp: comment,
...
}
...
}
The problem with this approach is that for example 2_timestamp > 19_timestamp because lexicographically 2 is bigger than 19
I could probably store top comments in a separate CF but then i would need to do two queries instead of one so i would really like to avoid that, any suggestions?

2 queries instead of one is usually not a big deal. You could also just do a composite value(number of likes+the comment) and sort the comments yourself....From stuff I have seen there is never alot of comments except a few posts anyways so that would be very quick.
There are other patterns that might spark ideas here as well...
https://github.com/deanhiller/playorm/wiki/Patterns-Page

Use a composite, where the first component is a long and the second is whatever type is appropriate for your timestamp format. This way the sorting will be correct.

Related

Is it possible to split array in Gherkin to the next line

I have step where I am have String Array, something like this:
Then Drop-dow patient_breed contains ['Breed1', 'Breed2',.... Breed20']
I need to split this text on two lines. I know that in Gherkin there is expression """. I try something like this:
Then Drop-dow patient_breed contains ['Breed1',
"""
'Breed2',.... Breed20']
"""
It didn't help. Is there any solution?
What do you gain by putting this string in your scenario. IMO all you are doing is making the scenario harder to read!
What do you lose by putting this string in your scenario?
Well first of all you now have to have at least two things the determine the exact contents of the string, the thing in the application that creates it and the hardcoded string in your scenario. So you are repeating yourself.
In addition you've increased the cost of change. Lets say we want our strings to change from 'Breedx' to 'Breed: x'. Now you have to change every scenario that looks at the drop down. This will take much much longer than changing the code.
So what can you do instead?
Change your scenario step so that it becomes Then I should see the patient breeds and delegate the HOW of the presentation of the breeds and even the sort of control that the breeds are presented in to something that is outside of Cucumber e.g. a helper method called by a step definition, or perhaps even something in your codebase.
Try with a datatable approach. You will have to add a DataTable argument in the stepdefinition.
Then Drop-dow patient_breed contains
'Breed1'
'Breed2'
...
...
...
'Breed20']
For a multiline approach try the below. In this you will have to add a String argument to the stepdefinition.
Then Drop-dow patient_breed contains
"""
['Breed1','Breed2',.... Breed20']
"""
I would read the entire string and then split it using Java after it has been passed into the step. In order to keep my step as a one or two liner, I would use a helper method that I implemented myself.

INDEX MATCH with Nested IF Statements

Here is my statement with just one IF Statement:
=IF(AF2="Consultant",IF(C2=INDEX(JIRA!F:F,MATCH('RFO Checks'!M2,JIRA!A:A,0)),1,0),"N/A")
This works great, but now I need to add two more IF Statements.
AF2 will either contain "Consultant", "Retailer", or "PC".
Each one will be directed to a different price column:
for "Consultant" it's JIRA!F:F
for "Retailer" it's JIRA!D:D
for "PC", it's JIRA!E:E.
I've been wracking my brain for two days now and haven't gotten no where.
Suggestions?
Thank you in advance!
Use CHOOSE()
=IFERROR(--(C2=INDEX(CHOOSE(MATCH(AF2,{"Consultant", "Retailer", "PC"},0),JIRA!F:F,JIRA!D:D,JIRA!E:E),MATCH('RFO Checks'!M2,JIRA!A:A,0))),"N/A")
Aaron,
I have provided you the high level structure for the change of logic.
If this is what you want, then you can substitute the phrases with the appropriate logic.
=IF(AF2="Consultant",
IF(C2=INDEX(JIRA!F:F,MATCH('RFO Checks'!M2,JIRA!A:A,0)),1,0),
IF(AF2="Retailer",
<insert_logic_for_retailer>,
IF(AF2="PC",<insert_logic_for_PC>,"N/A")
)
)
Logic for Consultant -
IF(C2=INDEX(JIRA!F:F,MATCH('RFO Checks'!M2,JIRA!A:A,0)),1,0)
Logic for Retailer, use this to replace -
IF(C2=INDEX(JIRA!D:D,MATCH('RFO Checks'!M2,JIRA!A:A,0)),1,0)
Logic for PC, use this to replace -
IF(C2=INDEX(JIRA!E:E,MATCH('RFO Checks'!M2,JIRA!A:A,0)),1,0)
Let me know in case you still have any issues!

Separating fields out of a string in Hive

I have the following problem...
I work with Hive and want to add a file with several (different) rows of Strings. Those contain fields with a fixed size, like this:
A20130420bcd 34 fgh
where the fields have the length 1,8,6,4,3.
Separated it would look like this:
"A,20130420,bcd,fgh"
Is there any possibility to read the String and sort it into a field besides getting it as a substring for every field like
substring(col_value,1,1) Field1
etc?
I would imagine that cutting the already read part of the string would increase the performance, but i could think of any way to do this with the given functions here.
Secondly, as stated before, there are different types of strings, ordered and identified by the first character.right now just check those with the WHERE-Statement, but it's horrible, as it runs through the whole file just to find only the first String. Is there any way to read specific lines by their number? If i know, that the first string will be of a certain kind, can read it directly?
right it looks like this:
insert overwrite table TEST
SELECT
substring(col_value,1,1) field1,
...
substring(col_value,10,3) field 5
from temp_data WHERE substring(col_value,1,1) = 'A';
any ideas on this?
I would love to hear some ideas =)
You need to write yours generic-UDF parser that output the struct or map or whatever appropriate. you can refer to UDF that output multi-values.
then you can write
insert overwrite table output
select parsed.first, parsed.second
from (
select parse(taget)
from input
) parsed
where first='X';
About second question,you may need to check "explain" command of hive to see if hive do filter push-down for you.(just see how many map reduce it takes, theoretically it should be one map, depending on 1.hive version,
2.output table format
.)
In general sense, this is why database is popular -- take optimization into consideration for you .

sas generate all possible miss spelling

Does any one know how to generate the possible misspelling ?
Example : unemployment
- uemployment
- onemploymnet
-- etc.
If you just want to generate a list of possible misspellings, you might try a tool like this one. Otherwise, in SAS you might be able to use a function like COMPGED to compute a measure of the similarity between the string someone entered, and the one you wanted them to type. If the two are "close enough" by your standard, replace their text with the one you wanted.
Here is an example that computes the Generalized Edit Distance between "unemployment" and a variety of plausible mispellings.
data misspell;
input misspell $16.;
length misspell string $16.;
retain string "unemployment";
GED=compged(misspell, string,'iL');
datalines;
nemployment
uemployment
unmployment
uneployment
unemloyment
unempoyment
unemplyment
unemploment
unemployent
unemploymnt
unemploymet
unemploymen
unemploymenyt
unemploymenty
unemploymenht
unemploymenth
unemploymengt
unemploymentg
unemploymenft
unemploymentf
blahblah
;
proc print data=misspell label;
label GED='Generalized Edit Distance';
var misspell string GED;
run;
Essentially you are trying to develop a list of text strings based on some rule of thumb, such as one letter is missing from the word, that a letter is misplaced into the wrong spot, that one letter was mistyped, etc. The problem is that these rules have to be explicitly defined before you can write the code, in SAS or any other language (this is what Chris was referring to). If your requirement is reduced to this one-wrong-letter scenario then this might be managable; otherwise, the commenters are correct and you can easily create massive lists of incorrect spellings (after all, all combinations except "unemployment" constitute a misspelling of that word).
Having said that, there are many ways in SAS to accomplish this text manipulation (rx functions, some combination of other text-string functions, macros); however, there are probably better ways to accomplish this. I would suggest an external Perl process to generate a text file that can be read into SAS, but other programmers might have better alternatives.
If you are looking for a general spell checker, SAS does have proc spell.
It will take some tweaking to get it working for your situation; it's very old and clunky. It doesn't work well in this case, but you may have better results if you try and use another dictionary? A Google search will show other examples.
filename name temp lrecl=256;
options caps;
data _null_;
file name;
informat name $256.;
input name &;
put name;
cards;
uemployment
onemploymnet
;
proc spell in=name
dictionary=SASHELP.BASE.NAMES
suggest;
run;
options nocaps;

How to number floats in LaTeX consistently?

I have a LaTeX document where I'd like the numbering of floats (tables and figures) to be in one numeric sequence from 1 to x rather than two sequences according to their type. I'm not using lists of figures or tables either and do not need to.
My documentclass is report and typically my floats have captions like this:
\caption{Breakdown of visualisations created.}
\label{tab:Visualisation_By_Types}
A quick way to do it is to put \addtocounter{table}{1} after each figure, and \addtocounter{figure}{1} after each table.
It's not pretty, and on a longer document you'd probably want to either include that in your style sheet or template, or go with cristobalito's solution of linking the counters.
The differences between the figure and table environments are very minor -- little more than them using different counters, and being maintained in separate sequences.
That is, there's nothing stopping you putting your {tabular} environments in a {figure}, or your graphics in a {table}, which would mean that they'd end up in the same sequence. The problem with this case (as Joseph Wright notes) is that you'd have to adjust the \caption, so that doesn't work perfectly.
Try the following, in the preamble:
\makeatletter
\newcounter{unisequence}
\def\ucaption{%
\ifx\#captype\#undefined
\#latex#error{\noexpand\ucaption outside float}\#ehd
\expandafter\#gobble
\else
\refstepcounter{unisequence}% <-- the only change from default \caption
\expandafter\#firstofone
\fi
{\#dblarg{\#caption\#captype}}%
}
\def\thetable{\#arabic\c#unisequence}
\def\thefigure{\#arabic\c#unisequence}
\makeatother
Then use \ucaption in your tables and figures, instead of \caption (change the name ad lib). If you want to use this same sequence in other environments (say, listings?), then define \the<foo> the same way.
My earlier attempt at this is in fact completely broken, as the OP spotted: the getting-the-lof-wrong is, instead of being trivial and only fiddly to fix, absolutely fundamental (ho, hum).
(For the afficionados, it comes about because \advance commands are processed in TeX's gut, but the content of the .lof, .lot, and .aux files is fixed in TeX's mouth, at expansion time, thus what was written to the files was whatever random value \#tempcnta had at the point \caption was called, ignoring the \advance calculations, which were then dutifully written to the file, and then ignored. Doh: how long have I know this but never internalised it!?)
Dutiful retention of earlier attempt (on the grounds that it may be instructively wrong):
No problem: try putting the following in the preamble:
\makeatletter
\def\tableandfigurenum{\#tempcnta=0
\advance\#tempcnta\c#figure
\advance\#tempcnta\c#table
\#arabic\#tempcnta}
\let\thetable\tableandfigurenum
\let\thefigure\tableandfigurenum
\makeatother
...and then use the {table} and {figure} environments as normal. The captions will have the correct 'Table/Figure' text, but they'll share a single numbering sequence.
Note that this example gets the numbers wrong in the listoffigures/listoftables, but (a) you say you don't care about that, (b) it's fixable, though probably mildly fiddly, and (c) life is hard!
I can't remember the syntax, but you're essentially looking for counters. Have a look here, under the custom floats section. Assign the counters for both tables and figures to the same thing and it should work.
I'd just use one type of float (let's say 'figure'), then use the caption package to remove the automatically added "Figure" text from the caption and deal with it by hand.

Resources