Connecting to the logical replication/streaming from node or go? - node.js

Is there a way to connect/subscribe to Postgres logical replication/streaming replication using node or go? I know its a TCP/IP connection but not exactly where to start. I also know there is a package for this, was wondering for more of a vanilla/understanding solution.

I'm not certain what you want, but maybe you are looking for “logical decoding”.
If you want to directly speak the replication protocol with the server, you'll have to implement it in your code, but that information is pretty useless, as it only contains the physical changes to the data files.
If you want logical decoding, there is the test_decoding module provided by PostgreSQL, and here are some examples how it can be used.
Mind that test_decoding is for testing. For real-world use cases, you will want to use a logical decoding plugin that fits your needs, for example wal2json.
If that's what you want to consume, you'll have to look up the documentation for the logical decoding plugin you want to use to learn the format in which it provides the information.

Related

Design application using Apache Spark

Its a bit architectural kind of question. I need to design an application using Spark and Scala as the primary tool. I want to minimise the manual intervention as much as possible.
I will receive a zip with multiple files having different structures as an input at a regular interval of time say, daily. I need to process it using Spark . After transformations need to move the data to a back-end database.
Wanted to understand the best way I can use to design the application.
What would be the best way to process the zip ?
Is the Spark Streaming can be considered as an option looking at the frequency of the file ?
What other options should I take into consideration?
Any guidance will be really appreciable.
Its a broad question, there are batch options and stream options not sure your exact requirements. you may start your research here: https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-FileStreamSource.html

Which combinations of CAP does GdH support?

I'm interested in the implementation of GdH (Glasgow Distributed Haskell).
However, I could not find out which combinations in CAP theorem GdH supports.
Can we choose one of them or do programs in GdH consist of explicit processes like Erlang?
From what I've read it's essentially RMI only you're communicating using Mvars, there in my experience aren't really any limitations that any ecosystem for CAP, so much as in the problems in themselves for example YOU can use more than 5 nodes in ETCD or Zookeeper, but all of the election messages, the fact that the leader commits to all nodes, the request forwarding, the checks you have to make before applying the log, doesn't exactly make it performant well does it.
Nobody should be using RMI for a new application in 2017,honestly you would be better off using an RPC package.
https://hackage.haskell.org/package/courier which is a light weight message passing lib, Zeromq, and Best for last
http://haskell-distributed.github.io/ which is basically the closest thing you get to OTP on haskell.

Machine learning and Security

I would like to ask you if it is possible to secure a server with AI/machine learning based on the following concepts:
1) the server is implemented in a way to recognize a normal behavior(authorized access, modification, ...) .
2) the server must recognize any abnormal behavior and adapt to it if encountered.
3) if an abnormal behavior is caught, it checks in some kind of pre-known threat list what type of threat it is and a possible solution for it ELSE it adapts "by itself" and perform changes based on what the normal behavior must be.
PS: If there already is a system similar to this one please let me know.
Thank you for your help!
Current IDS/IPS systems for applications ("web application firewalls") are in part similar to this (the other part is usually plain pattern matching to find common or known attacks or attack classes). First you switch a WAF to "learning mode", it listens to traffic and stores patterns as normal behavior. Then you switch it to "prevention mode" and it stops any traffic that is out of the ordinary flow.
The key is what aspects of the dataflows they listen to and learn to try and find anomalies. Basically a WAF would look at http queries to pages, learn parameter types and length, maybe clients as well, and in prevention mode it would not allow a type or length mismatch (any request not matching the learned values would be stopped on the WAF).
There are obvious drawbacks to this, the learning phase can never be long enough, learnt rules will either be too generic or too specific, manual setup is tedious for a large application, etc.
Taking it to a more generic level would be very (very) difficult. Maybe with a deep neural network (so popular nowadays) you could better approximate a "real" AI that actually learns good and bad traffic patterns. Two obvious problems are getting patterns to teach it (how will you provide good and bad traffic examples in excessive amounts so that it can actually learn the difference) and operational cost (running such a deep neural network would be very expensive, probably way more than a typical application breach would cost - defenses should be proportionate to the risk).
Having said that, I think it's not impossible, but it will take a few years until we get there.
The general idea is interesting and there is a lot of research on this topic currently: https://github.com/Limmen/awesome-rl-for-cybersecurity
But it's still quite far from being mature enough to use in practical settings.

Vert.X with Hazelcast

I am a beginner in Vert.X and as per the documentation it is mentioned that Vert.X sharedSet and Map supports only immutable objects across verticles. If in case I want to share a java object, (assuming I am using Java based verticles) across verticles or modules what is the recommended approach? Can I use a hazelcast distributed hash table for that ?
I really think you should try a different approach, otherwise you will be involved in one of the strongest points Vert.x is trying to alleviate: concurrency troubles. If I would have that requirement I would use something like Redis to have a really fast, centralized, volatile store I can access and share something. Maybe this doesn't answer your question, but pointing to a different approach...anyway, try to stay away from "shared state". Good luck!

Custom log processing/parsing

I have such log format:
[26830431.7966868][4][0.013590574264526367][30398][api][1374829886.320353][init]
GET /foo
{"controller"=>"foo", "action"=>"index"}
[26830431.7966868][666][2.1876697540283203][30398][api][1374829888.4944339][request_end]
200 OK
The entry is constracted using such pattern:
[request_id][user_id][time_from_request_started][process_id][app][timestamp][tagline]
payload
Durring request I have many point where I log something - app basically has complex behaviour. This helps me debug a lot the user behaviour.
The way I would like to parse it is that I would like to make have directory structure like this:
req_id
|
|----[time_from_request_started][process_id][timestamp][tagline]
|
etc
Basically each directory will have name based on req_id, with files wchich names are rest of tagline. These files will include payload.
And also I will have other directory, with users ids, which will contain symlinks to request done by this user.
First question: Is this structure correct? In my opinion it will make easy fast log access. The reason I want to use directories and files is that I like unix approach, and try it (feel by myself its drawbacks and advantages)
Second question: I will have no problem to use ruby for creating this. But I would like to learn some new tool, which is better suited for this. I am thinking about using just unix tools (pipe, awk etc) to achieve this, or write parser in golang which I am learning right now (even have time to implement simple map reduce). What tool is best suited for this?
I would not store logs in a directory to see how the users behave.
Depending on what behaviour you want to keep track of you could use different tools. One of these could be mixpanel or keen.io.
Instead of logging what the user did in a log file you would sent an event to either of those (they are pretty similar, pick the one you think has better docs / lib), then you would graph those events to better understand the behaviour of your users. I've done this a lot recently, to display data in a nice way I've used rickshaw.
The key point why I'm suggesting this is that if you go the file route you will still have to find a way to understand your data, something that graphs will help you a lot at. Also, visualization is something keen.io does by default, you may still want to do your graphs but it's a good start.
Hope this helped.
Is this structure correct?
Only you can know that, it depends directly on how the data needs be accessed and used.
What tool is best suited for this?
You could probably use UNIX tools to achieve this but it may as well be a good exercise to practice your Go skills by writing this. It would also be more extensible.

Resources