Parsing TCP payloads using a custom spec

Parsing TCP payloads using a custom spec - node.js

My goal is to create a parser for TCP packets that are using a custom spec from the Options Price Reporting Authority found here but I have no idea where to start. I've never worked with low-level stuff and I'd appreciate if I got some guidance.
The huge problem is I don't have access to the actual network because it costs a huge sum per month and all I can work off is the specification. I don't even if it's possible. Do you step by step parse each byte and hope for the best? Do you first re-create some example data using the bytes in the spec and then parse it? Isn't that also difficult since (I think) that TCP spread the data to multiple blocks?

That's quite an elaborate data feed. A quick review of the spec shows that it contains enough information to write a program in either nodejs or golang to ingest it.
Getting it to work will be a big job. Your question didn't mention your level of programming skill, or of your network engineering skill. So it's hard to guess how much learning lies ahead of you to get this done.
A few things.
It's a complex enough protocol that you will need to test it with correctly formatted sample data. You need a fairly large collection of sample packets in order to mock your data feed (that is, build a fake data feed for testing purposes). While nothing is impossible, it will be very difficult to build a bug-free program to handle this data without extensive tests.
If you have a developer relationship to the publisher of the data feed, you should ask if they offer sample data for testing.
It is not a TCP / IP data feed. It is an IP multicast datagram feed. In IP multicast feeds you set up a server to listen for the incoming data packets. They use multicast to achieve the very low latencies necessary for predatory algorithmic trading.
You won't use TCP sockets to receive it, you'll use a different programming interface called UDP datagrams
If you're used to TCP's automatic recovery from errors, datagrams will be a challenge. With datagrams you cannot tell if you failed to receive data except by looking at sequence numbers. Most data feeds using IP and multicast have some provision for retransmitting data. Your spec is no exception. You must handle retransmitted data correctly or it will look like you have lots of duplicate data.
Multicast data doesn't move over the public network. You'll need a virtual private network connection to the publisher, or to co-locate your servers in a data center where the feed is available on an internal network.
There's another, operational, spec you'll need to cope with to get this data. It's called the Common IP Multicast Distribution Network Recipient Interface Specification. This spec has a primer on the multicast dealio.
You can do this. When you have made it work, you will have gained some serious skills in network programming and network engineering.
But if you just want this data, you might try to find a reseller of the data that repackages it in an easier-to-consume format. That reseller probably also imposes a delay on the data feed.

Related

Why is the application data of a packet is called a protocol?

I've been reading on packets a lot today. I was confused for sometime because smtp, http, or ftp, for example, are all called protocols. But that they also somehow utilize transport protocols like TCP. I couldn't locate them on the packet 4 layers. Until I just discovered they're simply part of the application layer.
I want to know what exactly these "protocols" offer. I'm guessing a specific format for the data which applications on the client side know how to handle? If so, does this mean that realistically, I might have to create my own "protocols" if I created an application with a unique functionality?

A protocol, in this case, is just a structured way of communicating between two or multiple parties.
If you write, for example, a PHP-App and offer an API, you created a protocol to interact with your program. It defines how others interact with it and what response they can expect while doing so. Your self-created protocol depends on others, like the HTTP and TCP.
I suggest watching following video of LiveOverflow, explaining exactly this:
https://www.youtube.com/watch?v=d-zn-wv4Di8&ab_channel=LiveOverflow
I want to know what exactly these "protocols" offer.
You can read the definition of each protocol, if you really want to

How to filter packets sent by dns.lookup in Wireshark on Windows?

To give some context: we're monitoring the network connectivity in our app by calling nodeJS dsn.lookup within an interval. Some of our users are facing issues with this technique. So I'd like to inspect what's going on when degrading the network connectivity with a tool like clumsy .
Before debugging the app, I 'd rather like to ensure clumsy is doing a proper job (when playing with the packet drop feature for example) and want to inspect the packets in Wireshark.
What filter should I apply to filter the packets I am interested in?
Reading at the documentation of dns.lookup, it seems that the method as nothing to do with a real dns request and under the hood, it's using the system getaddrinfo implementation. Unfortunately, this does not really help me to figure out a proper filter in Wireshark. I even feel that somehow, the request aren't even captured, since I did a global search with the ip and the hostname I am performing the lookup against.

altering packet content of forwarded packets with nft or iptables using queues

I need to create a moderatly large application that changes the content of forwarded packets quite drastically. I was wondering whether or not I could alter the content of a packet that is intended for routing (kind of performing a man in the middle) using a userspace application based around something like queues from nft or iptables.
all that i've seen in the documentation revolves around accepting or dropping the packet and not altering it's content, and i've read somewhere that the library that is in charge of the queues only copies the packets from kernelspace and thus renders me unable to alter them, but i was wondering maybe I was missing something or there was a known hack about doing something of the sort.
i'd really appriciate your input and thanks a bunch.

What is the size of CoAP packet?

I'm new for this technology, can somebody help me to know about some doubt?
Q-1. What is the size of CoAP packet?
(I know there is 4 byte fixed header, but what is the maximum size limit including header, option and payload?)
Q-2. Is there any concept for Keep Alive like MQTT?
(It works on UDP for how much time it keeps open the connection, is there any default time or it keeps open every time when we send packet?)
Q-3. Can we use CoAP with TCP?
(Main problem with it CoAP is it works on UDP, is there any concept like MQTT QoS? Let's say a sensor publishes some data every one second, if subscriber goes offline, is there any surety in CoAP that subscriber will get all the data when it come online?)
Q-4. What is the duration of connection?
(CoAP supports publish/subscribe architecture, may be it needs connection open all the time, is it possible with CoAP whether it is based on UDP.)
Q-5. How does it discover the resources?
(I have one gateway and 5 sensors, how will these sensors connect to the gateway? Will the gateway find these sensors? Or will sensors find the gateway?)
Q-5. How does sensor register with gateway?
Please help me, I really need answer. I'm all new for these kind of things and suggest me something for implementation point of view.
Thanks.

It Depends:
Core CoAP messages must be small enough to fit into their link-layer packets (~ 64 KiB for UDP) but, in any case the RFC states that:
it SHOULD fit within a single IP packet to avoid IP fragmentation (MTU of 1280 for IPv6). If nothing is known about the size of the headers, good upper bounds are 1152 bytes for the message size and 1024 bytes for the payload size;
or less to avoid adaptation layer fragmentation (60-80 bytes for 6LoWPAN networks);
if you need to transfer larger payloads, this IETF draft extends core CoAP with new options for transferring multiple blocks of information from a resource representation in multiple request-response pair (so you can transfer more than 64KiB per message).
I never used MQTT, in any case CoAP is connectionless, requests and responses are exchanged asynchronously over UDP or DTLS. I suppose that you are looking for the observe functionality: it enables CoAP clients to "subscribe" to resources and servers to send updates to subscribed clients over a period of time.
There is an IETF draft describing CoAP over TCP, but I don't know how it interacts with the observe functionality: usually It follows a best-effort approach, it just happens that the client is considered no longer interested in the resource and is removed by the server from the list of observers.
The observe stops when the server thinks that the client is no longer interested in the resource or when the client ask to unsubscribe from the resource.
There is a well-known relative URI "/.well-known/core". It is defined as a default entry point for requesting the list of links about resources hosted by a server. Here for more infos.
Look at 5.

Sending Data over network inside kernel

I'm writing a driver in Linux kernel that sends data over the network . Now suppose that my data to be sent (buffer) is in kernel space . how do i send the data without creating a socket (First of all is that a good idea at all ? ) .I'm looking for performance in the code rather than easy coding . And how do i design the receiver end ? without a socket connection , can i get and view the data on the receiver end (How) ? And will all this change ( including the performance) if the buffer is in user space (i'll do a copy from user if it does :-) ) ?

If you are looking to send data on the network without sockets you'd need to hook into the network drivers and send raw packets through them and filter their incoming packets for those you want to hijack. I don't think the performance benefit will be large enough to warrant this.
I don't even think there are normal hooks for this in the network drivers, I did something relevant in the past to implement a firewall. You could conceivably use the netfilter hooks to do something similar in order to attach to the receive side from the network drivers.

You should probably use netlink, and if you want to really communicate with a distant host (e.g. thru TCP/IPv6) use a user-level proxy application for that. (so kernel module use netlink to your application proxy, which could use TCP, or even go thru ssh or HTTP, to send the data remotely, or store it on-disk...).
I don't think that having a kernel module directly talking to a distant host makes sense otherwise (e.g. security issues, filtering, routing, iptables ...)
And the real bottleneck is almost always the (physical) network itself. a 1Gbit ethernet is almost always much slower than what a kernel module, or an application, can sustainably produce (and also latency issues).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string