Ping router.utorrent.com DHT node using netcat - bittorrent

I'm just trying to get a response from router.utorrent.com to potentially make a DHT service down the track. For example, given a magnet link with:
xt=urn:btih:a78c852bcc0379e612e1bd794e1fd19061b84d11
the hash is:
a78c852bcc0379e612e1bd794e1fd19061b84d11
Then in the terminal I entered this:
nc -u router.utorrent.com 6881
d1:ad2:id20:a78c852bcc0379e612e1bd794e1fd19061b84d11e1:q4:ping1:t1:01:y1:qe
based on this documentation but i dont get any response. I even tried Wireshark to check if any packet at all was coming back and still nothing.
Why doesn't μTorrent talk to me?

The hash should be in binary. In bencoding the number + colon is the length prefix for a string. A sha1 in hex is 40 bytes long, for it to be actually 20 bytes long it needs to be the raw output of the hash function.
You basically need to conver the hex string to binary (40 hex -> 20 binary) for it to work.

As explained in the other answer, bencoding is a binary format, not a text format. You seem to be trying to enter the message body into netcat using a terminal. Terminals are designed for entering textual input to programs, not binary, and so you will not be able to directly enter that sequence into netcat's stdin.
The ID in your ping request should not be the torrent's infohash. It should be a unique ID to identify your client to the DHT network. For testing, you really could just pick an ID made up entirely of 20 ASCII characters, and avoid these encoding issues, but in practice you'll want to use uniformly random binary IDs.
If you want to a binary ID in a terminal, you shouldn't try inputting it directly into netcat. Instead you can use your shell's echo command and hex encoding to produce the data in the intended format, and pipe that into netcat. For example, in bash:
echo -n $'d1:ad2:id20:\x23\x71\x0c\x1c\xb4\x50\x7d\x87\x29\xb8\x3f\x87\x2c\xc6\xa2\xa4\x4c\x39\x73\x67e1:q4:ping1:t1:01:y1:qe' | nc -u router.utorrent.com 6881
Note that the response you get from the node will be unescaped binary, not necessarily text, so displaying it directly in your terminal as we're doing here could result in things appearing strangely or your current terminal session being messed up some way.

Related

(dd command linux) last byte goes to next line

Hi friends I need some help.
We have a tool that convert binary files to text files, and after that stores into Hadoop (HDFS).
In production, that ingestion tool uses ftp to download files from mainframe in binary format (EBCDIC), and we don't have access to donwload files from mainframe in development environment.
In order to test file conversion, we manually create text files, and we are trying to convert file using dd command (linux), using these parameters:
dd if=asciifile.txt of=ebcdicfile conf=ebcdic
After pass through our conversion tool, the expected result is:
000000000000000 DATA
000000000000000 DATA
000000000000000 DATA
000000000000000 DATA
However, it's returning the following result:
000000000000000 DAT
A000000000000000 DA
TA000000000000000 D
ATA000000000000000
I have tried with cbs, obs and ibs parameters, assigning lrec (number of lines of each line) without success.
Can anyone help me?
A few things to consider:
How exactly is the data transferred via FTP? Your "in binary format(EBCDIC)" simply doesn't make any sense at all. The FTP either transfers in binary format, then nothing gets changed, or converted during the transfer. Or the FTP transfers in text mode, aka. ASCII mode, then data is converted from a specific EBCDIC code page to a specific non-EBCDIC code page. You need to know what mode, and if text mode, what are the two code pages being used.
From the man pages for dd, it is unclear what EBCDIC, and ASCII code pages are used for the conversion. I'm just guessing here: EBCDIC code page might be CP-037, and ASCII might be CP-437. If these don't match the ones used in the FTP, the resulting test data is incorrect.
I understand you don't have access to production data in the development environment. However, you should still be able to get test data from the development mainframe using FTP from there. If not, how will you be doing end to end testing?
The EBCDIC conversion is eating your line endings:
https://www.ibm.com/docs/en/zos/2.2.0?topic=server-different-end-line-characters-in-text-files

How to convert "binary text" to "visible text"?

I have a text file full of non-ASCII characters.
I can not detect the encoding by either file or enca.
file non_ascii.txt
non_ascii.txt: Non-ISO extended-ASCII text
enca non_ascii.txt
Unrecognized encoding
But I can open it normally in Windows Notepad++
Edit: The expression above leads misunderstanding. Sorry for this.
In fact, I picked some parts of the original file and put them into new text file, then opened in notepad++.
The 2 parts shows as below. They are decoded in 2 different ways by notepad++.
Question:
How could I detect the files encoding under linux?
how do I recover the characters represented by <F1><EE><E9><E4><FF>?
I couldn't get result by "grep 'сойдя' win.txt" even though the "сойдя" is encoded into <F1><EE><E9><E4><FF>?
The file content slice as follows:
less non_ascii.txt
"non_ascii.txt" may be a binary file. See it anyway?
<F1><EE><E9><E4><FF>
<F2><F0><E0><EA><F2><EE><E2><E0><F2><FC><F1><FF>
<D0><F2><E9><E4><D7><E9><E7><E1><EC><E1><F3><F8>
<D1><E5><EA><F3><ED><E4>
<F0><E0><E7><E3><F0><F3><E7><EA><E8>
<EF><EE><E4><F1><F2><E0><E2><EB><FF><F2><FC>
<F0><E0><E7><E3><F0><F3><E7><EA><E5>
<F1><EE><E9><E4><F3>
<F0><E0><E7><E3><F0><F3><E7><EA><E0>
<F1><EE><E2><EB><E0><E4><E0><EB><E8>
<C1><D7><E9><E1><F0><EF><FE><F4><E1>
<CB><C1><D3><D3><C9><D4><C5><D2><C9><D4>
<F1><EE><E2><EB><E0><E4><E0><EB><EE>
<F1><EE><E9><E4><E8>
<F1><EE><E2><EB><E0><E4><E0><EB><E0>
Your question really has two parts: (1) how do I identify an unknown encoding and (2) how do I convert that to something useful?
The first part is the real challenge, and really cannot be answered in universal terms -- in the general case, there is no reliable way to identify an unknown 8-bit encoding. Some encodings give you good hints (UTF-8 is an excellent example) and in many cases, if you have a good idea what the text is supposed to represent, the problem can be solved.
A mapping of 8-bit character meanings can be helpful (cough, the link is to mine) and in this case quickly hints at Windows code page 1251. Kudos for the hex dumps and the picture with the representation you expect!
With that out of the way, converting is easy.
iconv -f cp1251 -t utf-8 non_ascii.txt >utf8.txt
Provided your Linux system is set up to use UTF-8 at the terminal, your grep command should work on utf-8.txt now.
The indication that some of the text is "ANSI" (which is a bogus term anyway) is probably just a red herring -- as far as I can tell, everything in your excerpt looks like well-formed CP1251.
Some tools like chardet do a reasonable job of at least steering you in the right direction, though you have to understand that, like a human expert, they have to guess what the text is supposed to represent. There are corner cases where they just don't have enough information to guess correctly, either because there are several candidate encodings with very few differences (for example, Latin-1 vs Latin-9 vs Windows-1252, all of which also overlap with plain 7-bit US-ASCII in the first 128 positions) or because the input doesn't contain enough information to establish any common patterns.

codepage conversion support on linux

I have two questions regarding codepages on linux.
Is there any way to list out all the combination of codepages conversions possible on linux.
If i have a file with data encoded in some format(say encode-1), i can use
"iconv -f encode-1 -t encode-2 file > file1.txt" to encode it into encode-2 format.
This way i can check conversion from encode-1 to encode-2 is possible. But for this to test i need to have some file already encoded in encode-1 format. Is there any way to test whether a particular conversion is possible without having any file already encoded with format encode-1.
You seem to be using iconv. To get the list of all possible encodings, just run
iconv -l
If you do not have any file in a given encoding, you can create one: take any file in a known encoding and use iconv to convert it into the given encoding. If you are worried the conversion can exit in the middle, use
iconv -c
It omits invalid characters in the output, but encodes everything it can.

how to print file containing text with ANSI escapes

I would like to print a file containing text with ANSI escapes.
Here is file content (generated with bash script):
\033[1m bold message example \033[0m
normal font message
When printing file to screen in terminal, it works nice:
cat example.txt
shows:
bold message example
normal font message
But my problem when I try to send it to a printer:
lp example.txt
prints:
1mbold message example2m
normal font message
Is there a way to print this file correctly ? Maybe with groff (can be used to print a styled man page), but I did not manage to get anything efficient with it...
Maybe a2ps might be able to handle that (but I am not sure, you should try).
And I would rather suggest changing the way you get such a file with ANSI escapes (that is, also provide some alternative output format).
I mean that the program producing such a file (or such an output) could instead produce a more printable output (perhaps by generating some intermediate form, e.g. LaTeX, or Lout, or groff or HTML format, then forking the appropriate command to print it. That program could also generate directly PDF thru libharu or poppler, etc....)
Also, it might depend upon your printer, and the driver.

Configure gitg to use UTF8

gitg doesn't display correctly git diff of UTF8 files, even while
git does it correctly (as seen on the console)
gitg correctly displays UTF8 files (not diff)
Is it possible to configure it to correctly display diff of UTF8 encoded files ? If so how ?
EDIT :
barti_ddu helped me realize that what seems to happen, in fact, is that gitg guess the encoding from the received diff file.
I have this problem when I replace badly encoded chars by well encoded ones : the first one in the diff is the bad one, which probably leads to a bad guess (and gives the impression I'm replacing well encoded chars by bad ones) :
So the (less important) goal would be to force gitg decode diff as UTF8 instead of guessing.

Resources