I've been doing a little DHT scraping to figure out the popularity of various BitTorrent clients. In the results I've collected, some of the most common version strings are from a client identifying itself as "BigUp", but I haven't been able to find anything with this name. Here's a sample of the version strings returned from BEP10 handshakes:
BigUp/11 libtrt/1.1.0.0 Downloader/12430
BigUp/11 libtrt/1.1.0.0 Downloader/12440
BigUp/11 libtrt/1.1.0.0 Downloader/12450
BigUp/12 libtrt/1.2.0.0 Downloader/12460
BigUp/12 libtrt/1.2.0.0 Downloader/12470
BigUp/12 libtrt/1.2.0.0 Downloader/12480
BigUp/12 libtrt/1.2.0.0 Downloader/12490
BigUp/12 libtrt/1.2.0.0 Downloader/12500
BigUp/12 libtrt/1.2.0.0 Downloader/12510
BigUp/12 libtrt/1.2.0.0 Downloader/12520
BigUp/12 libtrt/1.2.0.0 Downloader/12530
BigUp/12 libtrt/1.2.0.0 Downloader/12540
BigUp/12 libtrt/1.2.0.0 Downloader/1940
BigUp/12 libtrt/1.2.0.0 Downloader/1950
BigUp/12 libtrt/1.2.0.0 Downloader/1960
BigUp/12 libtrt/1.2.0.0 Downloader/1970
BigUp/12 libtrt/1.2.0.0 Downloader/1980
BigUp/12 libtrt/1.2.0.0 Downloader/1990
BigUp/12 libtrt/1.2.0.0 Downloader/2010
Version numbers go all the way down to BigUp/5 libtorrent/0.16.0.0, but those are much less common. Also, the torrents that they are sharing are rather odd. Here's a sample of the most common infohashes:
3b2e1b303703b733f6407becc1140eae937d55ac
3b2e1b303703b733f6407becc1140eae937d55ac
4b21bf8f097a4e018ba2d2badf353012d686cd17
b16a48675e0fdb371238e4a6b075807bbd544c40
bd2045ad99b2f29f655ba566f26aedf50eae2780
d89a935c6e8c151b7b1a8278597a8dcba7d468b3
dbab2707740d3d3dadb16d1ea4d602959573cd05
dbf9a9a2815488c32a9c44aeb0af8ad04a33ebac
dde57ab80b8d0313f823e22e70af75ef6ec22882
debeeb0f4cad5861b322e55b8b18ed11169a27f4
Of the infohashes I've managed to resolve to torrent files, they have names like "warfacediff170-171" and contain small zipped files:
name | size
--------------- | ------
patch.7z.001 | 7.4MB
manifest.xml.gz | 705.0B
While these BigUp clients do have regular DHT functionality, they don't seem to offer magnet-link based torrent downloads, so it's hard to actually get copies of the torrent files. Also, there are relatively few unique torrents being shared by these clients - I've only found about 3k, while other less popular clients share hundreds of thousands.
Does anyone know what this client is?
It's a component in the "My.com Game Center" and is used for distributing updates for games and probably the game center itself. Warface is one of the games that My.com owns, which explains the torrent named "warfacediff".
If you decompile the Game Center installer, you will find a file named BIGUP2.dll. That is the torrent client, and it seems to be based on libtorrent.
Presumably, the reason why the client doesn't respond to ut_metadata requests is because the update torrent files are distributed centrally from https://static.gc.my.com/ and they don't want people scraping their torrents. They do run their own tracker, but they don't mark their torrents as private.
Related
Consider I have a web app wherein I show current data values coming from sensors of sensor data.
Assume we have 2 sensors, sensor A and sensor B, with ids values 1,2 respectively
Assume we have 2 tags each temperature and humidity.
I have configured a Nodejs app to pull data from sensors every 500 milliseconds and push data into postgres tables "data_live" like below.
sensor_id |tag |value
----------+-------------+------------
1|temperature |0.006817675
1|humidity |0.002902401
2|temperature |33
2|humidity |28
Note. Here, in table "data_live" we just keep current value for each machine/tag so every time we push update, we do update operation on database.
I want to record the history in timeseries manner using timescale extension in the below table named "data_ts".
time |machine_id|temperature|humidity
-----------------------------+----------+-----------+--------
2022-04-09 14:19:01.000 +0530| 1| 20.2| 55.3
2022-04-09 14:19:01.000 +0530| 2| 19.7| 50.1
2022-04-09 14:19:02.000 +0530| 1| 20.3| 55.4
2022-04-09 14:19:02.000 +0530| 2| 19.6| 50.0
I am thinking of using cron based scheduler to run a script periodically to perform below steps:
Fetching data from table: "data_live"
Applying crosstab (convert tabular structure to columnar structure) function
Insert columnar data into table "data_ts"
Limitation with this approach is I cannot execute cron scheduler every 500ms.
Sensors can go upto 100+ and tags can go upto 50+, So we also have to think about the scale.
Can anyone suggest me the solution here?
Let me know if you need any more information.
Thanks in advance.
If you want the solution to be self-contained in PG, the cron option is valid. However, since you call the scale problem, I would suggest you start thinking about data retention as well. E.g. for how long you want to keep the data for and have another job that periodically cleans the old entries.
If you want to expand the tech horizon, there are a multitude of alternative solution, some of which have been identified by the comment from #webdev_jj
We are in the adtech and we need to find the faster solution (and cheapest in CPU/RAM consumption) to read/write (to tens or even hundreds K IOPS) in a very simple matching table of ids like this :
Also we have a doubt on the table schema :
One line per partner
internal_id (uuid v4)
partner_id (int)
external_id (TEXT, no control on the length)
923a01d3-c480-4a80-92f1-4e11dfba6ed3
24
XzaV1lVbLoEAAFJkOQkAAAC5&1111
923a01d3-c480-4a80-92f1-4e11dfba6ed3
35
4420763609654968920
04643add-bc2b-4ade-be71-c1a2ad3d4a41
24
X-hgv2QiDJM4LUrlMLuTtwAA&1114
04643add-bc2b-4ade-be71-c1a2ad3d4a41
35
244500741791779031
...
...
...
or
One column per partner
internal_id (uuid v4)
partner_24 (TEXT, no control on the length)
partner_35 (TEXT, no control on the length)
923a01d3-c480-4a80-92f1-4e11dfba6ed3
XzaV1lVbLoEAAFJkOQkAAAC5&1111
4420763609654968920
4c1aeb2a-0773-4c7e-a025-e3c10c662358
X-hgv2QiDJM4LUrlMLuTtwAA&1114
244500741791779031
...
...
...
The size is very large (billions of internal_id), which is only getting bigger every day.
We don't need 100% data accuracy, we only search speed on the read, writes can be asynchronous or with a small latency.
Have you tried Redis or memcached? An in-memory hashmap might be faster but it will harder to implement for distributed lookups
All the examples I see of windowing involve defining the windows. E.g., tumbling 1-minute windows, or sliding 1-minute windows, etc. In my situation, all my data has timestamped events, but that's not the primary interest.
All my data also has an associated period that I do not have control over. That is the desired window in my case. The periods are time-based, but they vary from 2-3 weeks, roughly.
So, if I look at just the period of a stream of values might look like this (almost everything from the current period, a few stragglers from the last period early on in current period),
... PERIOD 6, PERIOD 5, PERIOD 6, PERIOD 6, PERIOD 6, PERIOD 6, ...
It's not clear to me how to handle this situation in terms of watermarks/triggers/etc? If I'm understanding all this terminology correctly I've thought of something like this: the watermark for PERIOD N occurs when the first event with PERIOD (N+1) is processed. The lateness horizon (for garbage collecting state) for the PERIOD N window can be 1-2 days after the timestamp of the first event with PERIOD (N+1). I'd like triggers to be accumulating and every 5 minutes (ideally, I'd like this trigger duration to be increasing: more frequent at beginning of the window, less frequent as time passes).
I'm trying to use terminology from this article, https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 sorry if it's incorrect
I'm particularly confused about how watermarks seem to be continuous and based on event-time. In my case, I have both event-time (timestamp) and event-time (period). If I'm understanding this correctly, the curve of my situation (as in the above article) would look like a step-function?
I haven't yet picked a stream processing framework to use. Does my situation make sense for any of them? Does this require a lot of custom logic? Does any framework make this easier? Is this a known problem with a name?
Any help is appreciated.
In Flink, one way to achieve this is to use the processing time window for aggregation. Then you use a rich map function to maintain the accumulated counts before the window. In the end, you sink the aggregates to long-term data storage
You can take a look at my blog post where we did something similar to this. Take a look at Section A peek into Milestone Two
I have data in the format { host | metric | value | time-stamp }. We have hosts all around the world reporting metrics.
I'm a little confused about using window operations (say, 1 hour) to process data like this.
Can I tell my window when to start, or does it just start when the application starts? I want to ensure I'm aggregating all data from hour 11 of the day, for example. If my window starts at 10:50, I'll just get 10:50-11:50 and miss 10 minutes.
Even if the window is perfect, data may arrive late.
How do people handle this kind of issue? Do they make windows far bigger than needed and just grab the data they care about on every batch cycle (kind of sliding)?
In the past, I worked on a large-scale IoT platform and solved that problem by considering that the windows were only partial calculations. I modeled the backend (Cassandra) to receive more than 1 record for each window. The actual value of any given window would be the addition of all -potentially partial- records found for that window.
So, a perfect window would be 1 record, a split window would be 2 records, late-arrivals are naturally supported but only accepted up to a certain 'age' threshold. Reconciliation was done at read time. As this platform was orders of magnitude heavier in terms of writes vs reads, it made for a good compromise.
After speaking with people in depth on MapR forums, the consensus seems to be that hourly and daily aggregations should not be done in a stream, but rather in a separate batch job once the data is ready.
When doing streaming you should stick to small batches with windows that are relatively small multiples of the streaming interval. Sliding windows can be useful for, say, trends over the last 50 batches. Using them for tasks as large as an hour or a day doesn't seem sensible though.
Also, I don't believe you can tell your batches when to start/stop, etc.
I am having an issue importing the ecoinvent v3.2 database (cut-off) in Brightway.
The steps followed were:
ei32cu = bw.SingleOutputEcospold2Importer(fp, "ecoinvent 3.2 cutoff")
ei32cu.apply_strategies()
All seemed to be going well. However, ei32cu.statistics() revealed that there were many unlinked exchanges:
12916 datasets
459268 exchanges
343020 unlinked exchanges
Type biosphere: 949 unique unlinked exchanges
Of course, the unlinked exchanges prevented the writing of the database using ei32cu.write_database() did not work: an "Invalid exchange" was raised.
My questions:
- How can I fix this?
- How can I access the log file (cited here) that might give me some insights?
- How can I generate a list of exchanges (and their related activities)?
It is strange you have unlinked exchanges with ei 3.2 cutoff, at least with python 3 should be very smooth importing 3.2 cutoff, are you perhaps on py2 or not using the latest version of bw2?
-difficult to give an answer without looking into the db, but if you are on py2 just try with the 3
-to check where the log is
`projects.logs_dir`
-to write the list of unlinked exchanges
ei32cu.write_excel(only_unlinked=True) #unlinked=False export the full list of exchanges
I now know why this problem occurred, and the solution is quite simple: in new projects, one needs to bw2setup before importing LCI databases.