Optimal check if IP is in subnet - node.js

I want to check if an IP address belongs to a subnet. The pain comes when I must check against 300.000 CIDR blocks having subnets ranging from /3 to /31, several million times / second.
Take https://github.com/indutny/node-ip for example:
I could ip.cidrSubnet('ip/subnet') for each all of the 300.000 blocks and check if the IP I'm looking for is inside the first-last address range, but this is very costly.
How can I optimally check if an IP address belongs to one of these blocks, without looping everytime through all of them?

Store the information in a binary tree that is optimized for range checks.
One naive way to do it is to turn each CIDR block into a pair of events, one when you enter the block, one when you exit the block. Then sort the list of events by IP address. Run through it and create a sorted array of IP addresses and how many blocks you are in. For 300,000 CIDR blocks there will be 600,000 events, and your search will be 19-20 lookups.
Now you can do a binary search of that file to find the last transition before your current IP address, and return true/false depending on whether that was in one or more blocks versus in none.
The lookup will be faster if instead of searching a file, you are searching a dedicated index of some sort. (The number of lookups in the search is the same or slightly higher, but you make better use of CPU caches.) I personally have used BerkeleyDB's BTree data structure for this kind of thing in other languages, and have been very happy.

Related

How to remove network addresses from a list if they are a subnet of another network

After creating a list of ipaddress/CIDR from a csv file, converting the ipaddresses to network addresses and then eliminating duplicates by creating a set from the list (python 3.7), I iterate and eliminate all subnets that are subnets_of() another subnet, keeping the summarized or supernet address. I use the ipaddress module to do this work. The problem is, if the subnet is compared to itself, it still counts as a subnet. for example,
a = ipaddress.ip_network('192.168.0.0/24')
b = ipaddress.ip_network('192.168.0.0/24')
b.subnet_of(a)
True
So I even if there is a 192.168.0.0/23 in my list, the /24 is still added because all addresses are compared to all addresses. Is there a better to handle this type of situation?
I've tried removing the the subnet from my working list so it won't be iterated over again, no luck.
No error messages. I just get a subnet included that fits within a larger subnet in my list. This leaves an entry that is unnecessary.
have you tried just removing everything after the /?

How to change the middle node location in the torrc?

I am trying to edit my torrc and make all of the nodes funnel through one country.
So far I am able to force the entry and exit nodes but don't know how to change the middle node... any ideas?
I have already tried "MiddleNodes" and "RelayNodes"
EntryNodes {us},{ca}
ExitNodes {us},{ca}
StrictNodes 1
It's possible to restrict to MiddleNodes per Tor docs: https://2019.www.torproject.org/docs/tor-manual.html.en
MiddleNodes node,node,…
A list of identity fingerprints and country
codes of nodes to use for "middle" hops in your normal circuits.
Normal circuits include all circuits except for direct connections to
directory servers. Middle hops are all hops other than exit and entry.
This is an experimental feature that is meant to be used by
researchers and developers to test new features in the Tor network
safely. Using it without care will strongly influence your anonymity.
This feature might get removed in the future. The HSLayer2Node and
HSLayer3Node options override this option for onion service circuits,
if they are set. The vanguards addon will read this option, and if
set, it will set HSLayer2Nodes and HSLayer3Nodes to nodes from this
set. The ExcludeNodes option overrides this option: any node listed in
both MiddleNodes and ExcludeNodes is treated as excluded. See the
ExcludeNodes option for more information on how to specify nodes.
Edit: See new answer by #user1652110 describing MiddleNodes option which was added in January 2019.
There is no option to do so. The closest option you can try is ExcludeNodes by using as large a list of country codes as you can come up with that doesn't include the countries you do want to use.
Also note, at the time of writing, limiting your circuits' entry and exit points to relays in the US and Canada might severely limit your performance, anonymity, and reliability since there just aren't that many high-bandwidth exits and guards in these two countries.

Getting an intersection between 2 CIDR spaces when you have huge data sets

Basically, I have a list of IP subnets (supernets) which contains around 100 elements. In the same time, I have another list (ips) which contains around 300k of IP addresses/networks.
Example:
supernets = ['10.10.0.0/16', '12.0.0.0/8']
ips = ['10.10.10.1', '10.10.10.8/30', '12.1.1.0/24']
The end goal is to classify the IP addresses based upon where they fall in the supernet.
So what I did is to compare every IP addresses/network element in the 2nd list to the first element in the supernet lists and so on.
Baically, I do this:
for i in range(len(supernets)):
for x in ips:
if IPNetwork(x) in IPNetwork(sorted(supernets)[i]):
print(i, x, sorted(supernets)[i])
lod[i][sorted(supernets)[i]].append(x)
This works fine, but it take ages and the CPU goes crazy, so my question is, is there any methodology or clean code that can achieve this and save time?
UPDATE
I have sorted the lists and used list comprehension instead, and the
script took around 11mins to run which is a good optimization in terms
of speed. But the CPU is still 100% during the whole 11mins.
[lod[i][public[i]].append(x) for i in range(len(public)) for x in ips if IPNetwork(x) in IPNetwork((public)[i])]

Knot Resolver: How to observe and modify a resolved answer at the right time

Goal
I would like to stitch up a GNU GPL licensed Knot Resolver module either in C or in CGO that would examine the client's query and the corresponding resolved answer with the goal of querying an external API offering a knowledge base of malware infected hostnames and ip addresses (e.g. GNU AGPL v3 IntelMQ).
If there is a match with the resolved A's (AAAA's) IP address it is to be logged, likewise a match with the queried hostname should be logged or (optionally) it could result in sending the client an IP address of a sinkhole instead of the resolved one.
Means
I studied the layers and I came to the conclusion that the phase I'm interested in is consume. I don't want to affect the resolution process, I just want to step in at the last moment and check the results and possibly modify them.
I ventured to register the a consume function
with
static knot_layer_api_t _layer = {
.consume = &consume,
};
but I'm not sure it is the right place to do the deed.
Furthermore, I also looked into module hints.c, especially its query method
and module stats.c for its _to_wire function usage.
Question(s)
Phase (Layer?)
When is the right time to step in and read/write the answer to the query before it's send to the client? Am I at the right spot in consume layer?
Answer sections
If the following attempt at getting the resolved IP address gives me the Name Server's address:
char addr_str[INET6_ADDRSTRLEN];
memset(addr_str, 0, sizeof(addr_str));
const struct sockaddr *src = &(req->answer->sections);
inet_ntop(qry->ns.addr[0].ip.sa_family, kr_inaddr(src), addr_str, sizeof(addr_str));
DEBUG_MSG(NULL, "ADDR: %s\n", addr_str);
how do I get the resolved (A, AAAA) IP address for the query's hostname? I would like to iterate over A/AAAA IP addresses and CNAMEs in the answer and look at the IP addresses they were resolved to.
Modifying the answer
If the module setting demands it, I would like to be able to "ditch" the resolved answer and provide a new one comprising an A record pointed at a sinkhole.
How do I prepare the record so as it could be translated from char* to Knot's wire format and the proper structure in the right context at the right phase?
I guess it might go along functions such as knot_rrset_init and knot_rrset_add_rdata, but I wasn't able to arrive at any successful result.
THX for pointers and suggestions.
If you want to step in the last moment when the response is finalised but not yet sent to the requestor, the right place is finish. You can do it in consume as well, but you'll be overwriting responses from authoritative servers here, not the assembled response to requestor (which means DNSSEC validator is likely to stop your rewritten answers).
Disclaimer: Go interface is rough and requires a lot of CGO code to access internal structures. You'd be probably better suited by a LuaJIT module, there is another module doing something similar that you may take as an example, it also has wrappers for creating records from text etc. If you still want to do it, that's awesome and improvements to Go interface are welcome, read on.
What you need to do is roughly this (as CGO).
That will walk you through RR sets in the packet (C.knot_rrset_t),
where you can match type (rr.type) and contents (rr.rdata).
Contents is stored in DNS wire format, for address records it is the address in network byte order, e.g. {0x7f, 0, 0, 1}.
You will have to compare that to address/subnet you're looking for - example in C code.
When you find a match, you want to clear the whole packet and insert sinkhole record (you cannot selectively remove records, because the packet is append-only for performance reasons). This is relatively easy as there is a helper for that. Here's code in LuaJIT from policy module, you'd have to rewrite it in Go, using all functions mentioned above and using A/AAAA sinkhole record instead of SOA. Good luck!

What're the advantages/disadvantages of RECFM=FB over RECFM=F?

While defining a dataset to be created, one of the JCL parameters, DCB has a positional sub-parameter RECFM, has possible values of F,FB,V,VB etc.. What're the advantages/disadvantages of RECFM=FB over RECFM=F or RECFM=VB over RECFM=V? And which case prefers to use what RECFM format?
RECFM is short for record format.
F represents fixed length records, unblocked. FB represents fixed length records, blocked. Blocking stores multiple records in a disk block, while the unblocked format stores one record in a disk block. At one time, disk drives were so slow that the unblocked format provided relative speed, while the blocked format provided better disk usage. Today, with modern disk drives, there's no advantage to using the unblocked format.
V represents variable length records, unblocked. VB represents variable length records, blocked. You would use these formats if you have variable length records, rather than fixed length records. You need to add 4 to the maximum record length in the LRECL to account for the record length field.
There's an additional attribute character, A. Used with fixed blocked (FBA) or variable blocked (VBA), this tells the system that the first byte of your record is a printer control character.

Resources