Pull latest record from data using distinct on? - psycopg2

I have a data that looks like below:
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15
2020-10-30 17:57:17,False,2020-07-01,14,2,False,0.0,True,30.0,True,30.0,True,True,True,False
2020-10-30 17:57:17,False,2020-07-01,15,2,True,28.0,False,0.0,False,0.0,True,True,True,False
2020-11-15 17:57:17,True,2020-07-01,5,2,True,28.0,False,0.0,False,0.0,True,True,True,False
2020-11-15 17:57:17,False,2020-07-01,7,2,False,0.0,True,30.0,True,30.0,True,True,True,False
My query looks like the following:
select distinct on (col3) col4
from table where col13 is true and col15 is false
and col3 = '2020-07-01'
and col1 <= '2020-09-16'
and col2 is false order by col3, col1 asc;
My expected answer should be [14, 15] since these are earliest records for '2020-07-01'. However using the above query I only get [15]. Any ideas what I might be doing wrong.

I was able to resolve this using the following query:
select distinct col4,
from table where col13 is true and col15 is false
and col3 = '2020-07-01'
and col1 = (select min(col1) from table
where col1 <= '2020-09-16' and col3 = '2020-07-01')
and col2 is false;

Related

Update table based on CTE

I have an update query as below:
WITH update_cte as (
SELECT col1, col2,
daterange(col3, col4) as col5
FROM temp_table
)
UPDATE my_table
SET col5 = update_cte.col5
FROM update_cte inner join my_table
ON (update_cte.col1 = mytable.col1)
AND (update_cte.col2 = mytable.col2)
It gives me the following error:
"Error: table name \"my_table\" specified more than once\n"}
I was able to resolve it by specifying an alias in inner join.
WITH update_cte as (
SELECT col1, col2,
daterange(col3, col4) as col5
FROM temp_table
)
UPDATE my_table
SET col5 = update_cte.col5
FROM update_cte inner join my_table as idr
ON (update_cte.col1 = idr.col1)
AND (update_cte.col2 = idr.col2)

How can I calculate correlation within sub-groups of my sample in Excel

I am analyzing the results of a survey and I have 2 arrays that I am calculating the correlation for in Excel now that is easy enough, but how I can calculate the correlation for sub-groups that are scattered in the array without doing it manually. for example, I want to calculate the correlation between the 2 variables for males between 15-25 rather than for the whole sample
What I already tried was that I sorted the sample based on the needed dimension meaning that I would sort the whole sample by age so that the data would follow each other instead of being scattered, but this takes time, and can't work for 2 variables such as age and gender
you could do something like this:
=CORREL(QUERY(A2:C, "select A where B='15-20' and C='F'", 0),
QUERY(D2:F, "select D where E='15-20' and F='F'", 0))
or something like this:
=CORREL(FILTER(A2:A, B2:B="15-20", C2:C="F"),
FILTER(D2:D, E2:E="15-20", F2:F="F"))
but this would work only if you would have equal matrix from those two arrays... the issue with CORREL function is that it needs equal-sized range, but when you FILTER it or QUERY it then output ranges will vary...
then you can try this:
=IFERROR(CORREL(QUERY({A2:C}, "select Col1 where Col2='15-20' and Col3='F'", 0),
{QUERY({D2:F}, "select Col1 where Col2='15-20' and Col3='F'", 0);
TRANSPOSE(SPLIT(REPT("♂♀",
COUNTA(QUERY({A2:C}, "select Col1 where Col2='15-20' and Col3='F'", 0))-
COUNTA(QUERY({D2:F}, "select Col1 where Col2='15-20' and Col3='F'", 0))),"♀"))}),
CORREL(QUERY({D2:F}, "select Col1 where Col2='15-20' and Col3='F'", 0),
{QUERY({A2:C}, "select Col1 where Col2='15-20' and Col3='F'", 0);
TRANSPOSE(SPLIT(REPT("♂♀",
COUNTA(QUERY({D2:F}, "select Col1 where Col2='15-20' and Col3='F'", 0))-
COUNTA(QUERY({A2:C}, "select Col1 where Col2='15-20' and Col3='F'", 0))),"♀"))}))
note: sadly, even this has own limitations...
and for maximum comfort you can do this with it:
=IFERROR(IFERROR(CORREL(QUERY({A2:C}, "select Col1 where Col2='"&H9&"' and Col3='"&H10&"'", 0),
{QUERY({D2:F}, "select Col1 where Col2='"&I9&"' and Col3='"&I10&"'", 0);
TRANSPOSE(SPLIT(REPT("♂♀",
COUNTA(QUERY({A2:C}, "select Col1 where Col2='"&H9&"' and Col3='"&H10&"'", 0))-
COUNTA(QUERY({D2:F}, "select Col1 where Col2='"&I9&"' and Col3='"&I10&"'", 0))),"♀"))}),
CORREL(QUERY({D2:F}, "select Col1 where Col2='"&I9&"' and Col3='"&I10&"'", 0),
{QUERY({A2:C}, "select Col1 where Col2='"&H9&"' and Col3='"&H10&"'", 0);
TRANSPOSE(SPLIT(REPT("♂♀",
COUNTA(QUERY({D2:F}, "select Col1 where Col2='"&I9&"' and Col3='"&I10&"'", 0))-
COUNTA(QUERY({A2:C}, "select Col1 where Col2='"&H9&"' and Col3='"&H10&"'", 0))),"♀"))})),
IFERROR(CORREL(QUERY({A2:C}, "select Col1 where Col2='"&H9&"' and Col3='"&H10&"'", 0),
QUERY({D2:F}, "select Col1 where Col2='"&I9&"' and Col3='"&I10&"'", 0)),
CORREL(QUERY({D2:F}, "select Col1 where Col2='"&I9&"' and Col3='"&I10&"'", 0),
QUERY({A2:C}, "select Col1 where Col2='"&H9&"' and Col3='"&H10&"'", 0))))
demo spreadsheet

Cassandra select query failure

We have a table:
CREATE TABLE table (
col1 text,
col2 text,
col3 timestamp,
cl4 int,
col5 timestamp,
PRIMARY KEY (col1, col2, col3, col4)
) WITH CLUSTERING ORDER BY (col2 DESC, col3 DESC,col4 DESC)
When I try querying from this table like:
select * from table where col1 = 'something' and col3 < 'something'
and col4= 12 limit 5 ALLOW FILTERING;
select * from table where col1 = 'something' and col4 < 23
and col3 >= 'something' ALLOW FILTERING;
I always get the error: Clustering column "col4" cannot be restricted (preceding column "col3" is restricted by a non-EQ relation) .
I tried to change the table creation by making col4, col3, col2, but the second query doesn't work and throw a similar error.
Any suggetion/advice to solve this problem.
We are on : Cassandra 3.0.17.7.
You can use non-equality condition only on the last column of partition of the query.
For example, you can do use col1 = val and col2 <= ..., or col1 = val and col2 = val2 and col3 <= ..., or col1 = val and col2 = val2 and col3 = val3 and col4 <= ..., but you can't do non-equality condition on several columns - that's how Cassandra reads data.

Compare two columns and if match found check the next cell for a value and then return the result

COL1 COL2 COL3
Hi T_M12345678 T_455462
T_M12345670 T_M12345678
bye T_M123456781 T_M12345670
T_M123 T_M589646
T_M894545 T_M123456781
T_M418554651
T_M4546565
I need to compare COL2 and COL3; if any match is found then I need to compare with COL1 for that match found and if there is any value in COL1 then it should return a value on below mentioned scenarios true in COL4.
For Example,
Scenario 1:
Data T_M12345678 is present in COL2 and COL3 so match is found then, I need to check whether I have any value in COL1 for this data in COL2 and in this case, it is YES (Hi is the value in COL1) so I should print TRUE in COL4.
Scenario 2:
Data T_M12345670 is present in COL2 and COL3 so match is found; then I need to check whether I have any value in COL1 for this data in COL2 and in this case, it is NO so I should print TRUE1 in COL4.
Scenario 3:
Data T_M589646 in COL3 is not present in COL2 so I need to print FALSE in COL4.
Since you did not post the expected outcome, I created 2 additional columns (1 for values in COL2, other for values in COL3). The following formulas work as you defined.
COL2 value check:
=IFERROR(IF(AND(MATCH(B2,$C$2:$C$8,0),ISBLANK(A2)),"TRUE1","TRUE"),"FALSE")
COL3 value check:
=IFERROR(IF(AND(MATCH(C2,$B$2:$B$8,0),ISBLANK(A2)),"TRUE1","TRUE"),"FALSE")

ERROR CASSANDRA: 'ascii' codec can't decode bye 0xe1 in position 27: ordinal not in range(128) cqlsh

I'm new in Cassandra and I have a trouble inserting some rows in a database getting the error of the title.
I use cassandra 1.0.8 and cqlsh for doing changes in my database.
Next, I explain the given steps before I get the error:
CREATE A COLUMN FAMILY
CREATE TABLE test (
col1 int PRIMARY KEY,
col2 bigint,
col3 boolean,
col4 timestamp
);
INSERT SEVERAL ROWS WITHOUT SPECIFICYING ALL OF COLUMNS OF THE TABLE
insert into test (col1, col2, col3) values (1, 100, true);
insert into test (col1, col2, col3) values (2, 200, false);
SELECT FOR CHECKING THAT ROWS HAVE BEEN INSERTED CORRECTLY
select * from test;
The result is the following:
INSERT A ROW SPECIFICYING A VALUE FOR THE col4 (NOT SPECIFIED BEFORE)
insert into test (col1, col2, col3, col4) values (3, 100, true, '2011-02-03');
SELECT FOR CHECKING THAT ROW HAS BEEN INSERTED CORRECTLY
select * from test;
In this SELECT is the error. The result is the following:
SELECT EACH COLUMN OF THE TABLE SEPARATELY
select col1 from test;
select col2 from test;
select col3 from test;
select col4 from test;
it works fine and shows the right values:
Then, my question is: what's the problem in the first SELECT? what's wrong?
Thanks in advance!!
NOTE:
If I define col4 as Integer rather than a timestamp it works. However, I've tried to insert col4 as the normalized format yyyy-mm-dd HH:mm (I've tried with '2011-02-03 01:05' and '2011-02-03 01:05:10') but it doesn't work.
Cassandra 1.0.8 shipped with CQL2 and that's where your problem is coming from. I managed to recreate this in 1.0.8 but it works fine with 1.2.x so my advice is upgrade if you can.
In C* 1.2.10
cqlsh> update db.user set date='2011-02-03 01:05' where user='JCTYpjJlM';
cqlsh> SELECT * from db.user ;
user | date | password
-----------+--------------------------+----------
xvkYQKerQ | null | 765
JCTYpjJlM | 2011-02-03 01:05:00+0200 | 391
#mol
Weird, try to insert col4 as Integer (convert to milliseconds first) or use the normalized format : yyyy-mm-dd HH:mm
Accodring to the doc here, you can omit the time and just input the date but it seems that breaks something in your case

Resources