What is the query for COLUMN hashing using SHA-256 algorithm in presto? - presto

COLUMN hashing using SHA-256 algorithm in presto.
I tried and I am getting hashed column output but I need the output in SHA-256 format in Presto.

Related

How do I most effectively compress my highly-unique columns?

I have a Spark DataFrame consisting of many double columns that are measurements, but I want a way of annotating each unique row by computing a hash of several other non-measurement columns. This hash results in garbled strings that are highly unique, and I've noticed my dataset size increases substantially when this column is present. How can I sort / lay out my data to decrease the overall dataset size?
I know that the Snappy compression protocol used on my parquet files executes best upon runs of similar data, so I think a sort over the primary key could be useful, but I also can't coalesce() the entire dataset into a single file (it's hundreds of GB in total size before the primary key creation step).
My hashing function is SHA2(128) FYI.
If you have a column that can be computed from the other columns, then simply omit that column before compression, and reconstruct it after decompression.

How to generate single bcrypt hash for association of multiple columns in SQL Server/Sequelize/NodeJS?

I am working on a Electron/NodeJS application where i have a requirement wherein I have to create hashvalue which consist of all columns of a table.
I have done hasing for single password field like below and i don't have any experience how multiple columns hashing will work? Also if any column value changed on the sqlserver table, the hash should get invalidated. How can we do this?
Single Column Bcrypt Example

read hbase table salted with phoenix in hive with hbase serde

I have created an hbase table with Phoenix SQL create table query and also specified salt_buckets. Salting adds prefix to the the rowkey as expected.
I have created an external hive table to map to this hbase table with hbase serde The problem is when I query this table by filtering on rowkey:
where key = "value"
it doesn't work because I think salt pre-fix is also getting fetched for the key. This limits the ability to filter the data on key. The option:
"where rowkey like "%value"
works but it takes a long time as likely does the entire table scan.
My question is how can I query this table efficiently on row key values in hive (strip off salt pre-fix)?
Yes you're correct while mentioning
it doesn't work because I think salt pre-fix is also getting fetched for the key. '
One way to mitigate is to use hashing instead of random prefix.
And prefix the rowkey with the calculated hash
Using this technique you can calculate hash for the rowkey you want to scan for.:
mod(hash(rowkey),n) where n is the number of regions will remove the hotspotting issue
Using random prefix brings in the problem you mentioned in your question.
The option:
"where rowkey like "%value"
works but it takes a long time as likely does the entire table scan.
This is exactly what random prefix salting does. HBase is forced to scan the whole table to get the required value, so it would be better if you could prefix your rowkey with its calculated Hash.
But this hashing technique wont prove good in Range scans.
Now you may ask, why cant I simply replace my rowKey with its Hash and store the rowkey as separate column.
It may/may not work, but I would recommend implementing it this way because HBase is already very sensitive when it comes to Column Families.
But then again I am not clear on this solution.
You also might want to read this for more detailed explanation.

Murmur3 Hash Algorithm Used in Cassandra

I'm trying to reproduce the Murmur3 hashing in Cassandra. Does anyone know how to get at the actual hash values used in the row keys? I just need some key - hash value pairs from my data to check that my implementation of the hashing is correct.
Alex
Ask Cassandra! Insert some data in your table. Afterwards, you can use the token function in a select query to get the used token values. For example:
select token(id), id from myTable;
A composite partition key is serialized as n-times a byte array (that is always prepended with a short indicating its length) containing the serialized value of your key element and a closing 0. It's unclear to me what these closing zeros are for. Has something to do with SuperColumns...

Converting 128 bit int for row key in Cassandra

If I wish to have a comparable 128 bit integer equivalent as a row key in Cassandra, what data type is the most efficient to process this? ASCII using the full 8-bit range?
I need to be able to select row slices and ranges.
Row keys are not compared if you use Random Partitioner (the piece that determine how the keys get distributed around the cluster).
If you want to compare row keys use a Order Preserving partitioner ... but that will surely lead to an unbalanced cluster and crashes.
Column names get compared though, with other column names inside the same row.
So my advise is Bucket your columns into number intervals and insert your columns with LongType column name.
Probably just use the raw byte[] representation of the int and avoid any conversion; Comments above from le douard withstanding.
Raw byte[] comparison is not going to sort columns in numerical order. If that's what you want you should use varint (CQL) / IntegerType (Thrift)

Resources