My question is how I get python to read a file where the text is in 16bit characters. The rest of the post describes the situation.
I have a text file which is a playlist export from iTunes.
Here is a short section including the header
Name Artist Composer Album Grouping Work Movement Number Movement Count Movement Name Genre Size Time Disc Number Disc Count Track Number Track Count Year Date Modified Date Added Bit Rate Sample Rate Volume Adjustment Kind Equalizer Comments Plays Last Played Skips Last Skipped My Rating
Keyboard Works of the Masters Randolph Hokanson Pan125b 2054816 64 03/11/2017, 18:00 03/11/2017, 17:01 256 44100 MPEG audio file 1 03/11/2017, 17:02 4 08/03/2018, 16:07
08 Traccia 08 11159905 464 03/11/2017, 17:39 03/11/2017, 16:59 192 48000 MPEG audio file 1 03/11/2017, 16:59
09 Traccia 09 17787361 741 03/11/2017, 17:39 03/11/2017, 16:58 192 48000 MPEG audio file 5 08/03/2018, 10:58
10 Traccia 10 10128290 421 03/11/2017, 17:39 03/11/2017, 16:58 192 48000 MPEG audio file 1 03/11/2017, 16:58
When I use this code to read it, the program hangs. (The i holds the number of lines in the file). The hex dumps which follow seem to show the export from iTunes is in 16bit characters.
The complete code for reading the text file is
file_name="full path to file goes here"
f = open(file_name, "r")
i=227
for x in range(0, i):
line = f.readline()
When I read the code into text wrangler, selected all the text, and pasted it into a new document. The code worked fine.
A text dump of part of the original file looks like this to start with the new file following
00000000: FF FE 4E 00 61 00 6D 00 65 00 09 00 41 00 72 00 ..N.a.m.e...A.r.
00000010: 74 00 69 00 73 00 74 00 09 00 43 00 6F 00 6D 00 t.i.s.t...C.o.m.
00000020: 70 00 6F 00 73 00 65 00 72 00 09 00 41 00 6C 00 p.o.s.e.r...A.l.
00000030: 62 00 75 00 6D 00 09 00 47 00 72 00 6F 00 75 00 b.u.m...G.r.o.u.
00000040: 70 00 69 00 6E 00 67 00 09 00 57 00 6F 00 72 00 p.i.n.g...W.o.r.
00000050: 6B 00 09 00 4D 00 6F 00 76 00 65 00 6D 00 65 00 k...M.o.v.e.m.e.
00000060: 6E 00 74 00 20 00 4E 00 75 00 6D 00 62 00 65 00 n.t. .N.u.m.b.e.
00000070: 72 00 09 00 4D 00 6F 00 76 00 65 00 6D 00 65 00 r...M.o.v.e.m.e.
00000080: 6E 00 74 00 20 00 43 00 6F 00 75 00 6E 00 74 00 n.t. .C.o.u.n.t.
00000090: 09 00 4D 00 6F 00 76 00 65 00 6D 00 65 00 6E 00 ..M.o.v.e.m.e.n.
000000A0: 74 00 20 00 4E 00 61 00 6D 00 65 00 09 00 47 00 t. .N.a.m.e...G.
000000B0: 65 00 6E 00 72 00 65 00 09 00 53 00 69 00 7A 00 e.n.r.e...S.i.z.
000000C0: 65 00 09 00 54 00 69 00 6D 00 65 00 09 00 44 00 e...T.i.m.e...D.
000000D0: 69 00 73 00 63 00 20 00 4E 00 75 00 6D 00 62 00 i.s.c. .N.u.m.b.
000000E0: 65 00 72 00 09 00 44 00 69 00 73 00 63 00 20 00 e.r...D.i.s.c. .
000000F0: 43 00 6F 00 75 00 6E 00 74 00 09 00 54 00 72 00 C.o.u.n.t...T.r.
New file
0000: 4E 61 6D 65 09 41 72 74 69 73 74 09 43 6F 6D 70 Name.Artist.Comp
0010: 6F 73 65 72 09 41 6C 62 75 6D 09 47 72 6F 75 70 oser.Album.Group
0020: 69 6E 67 09 57 6F 72 6B 09 4D 6F 76 65 6D 65 6E ing.Work.Movemen
0030: 74 20 4E 75 6D 62 65 72 09 4D 6F 76 65 6D 65 6E t Number.Movemen
0040: 74 20 43 6F 75 6E 74 09 4D 6F 76 65 6D 65 6E 74 t Count.Movement
0050: 20 4E 61 6D 65 09 47 65 6E 72 65 09 53 69 7A 65 Name.Genre.Size
Your file beginning looks like UTF-16 - see Byte order marks - Wikipedia
Use
file_name="full path to file goes here"
with io.open(file_name,'r', encoding='utf-16-le') as f:
for line in f:
# do something with line
when opening it.
There is no need to use range() or readlines() when reading line by line. If you really need the line-numbers use:
for lineNr,line in enumerate(f):
Related
Hi websocket stream is not working in apple devices. Can I save live stream mp4 for mobile devices.
var nanocon = new WebSocket("wss://bintu-splay.nanocosmos.de/h5live/authstream?url=rtmp%3A%2F%2Fbintu-splay.nanocosmos.de%3A80%2Fsplay&stream="+stream+"&cid=579991&pid=63355099213&expires="+sec.expires+"&tag="+sec.tag+"&token="+sec.token+"&options="+sec.options+"");
nanocon.onmessage = function(onmsg)
{
const dataBuffer = onmsg.data;
const fileStream = Fs.createWriteStream('st/finalvideo.webm', {flags: 'a'});
fileStream.write(dataBuffer);
console.log(dataBuffer);
socket.send(onmsg.data);
}
And response data is
> {"eventType":"onUpdateSourceSuccess","onUpdateSourceSuccess":{"requestId":0,"count":0,"tag":""}}
> {"eventType":"onStreamInfo","onStreamInfo":{"haveVideo":true,"haveAudio":true,"mimeType":"video/mp4; codecs=\"avc1.42E01E, mp4a.40.2\"","prerollDuration":0,"videoInfo":{"width":1280,"height":720,"frameRate":30},"audioInfo":{"sampleRate":48000,"channels":2,"bitsPerSample":16}}}
> <Buffer 00 00 00 24 66 74 79 70 69 73 6f 35 00 00 00 01 61 76 63 31 69 73 6f 35 64 73 6d 73 6d 73 69 78 64 61 73 68 00 00 04 88 6d 6f 6f 76 00 00 00 6c 6d 76 ... >
> {"eventType":"onRandomAccessPoint","onRandomAccessPoint":{"streamTime":0}}
> <Buffer 00 00 00 a8 6d 6f 6f 66 00 00 00 10 6d 66 68 64 00 00 00 00 00 00 00 00 00 00 00 4c 74 72 61 66 00 00 00 10 74 66 68 64 00 02 00 00 00 00 00 01 00 00 ... >
Suppose I create a simple PNG with:
convert -size 1x1 canvas:red red.png
Here is a similar image (bigger size) for reference:
Then run the command identify on it. It tells me the ColorSpace of the image is sRGB but there seems to be NO indication of this inside the file. In fact running
$ hexdump -C red.png
00000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 |.PNG........IHDR|
00000010 00 00 00 01 00 00 00 01 01 03 00 00 00 25 db 56 |.............%.V|
00000020 ca 00 00 00 04 67 41 4d 41 00 00 b1 8f 0b fc 61 |.....gAMA......a|
00000030 05 00 00 00 20 63 48 52 4d 00 00 7a 26 00 00 80 |.... cHRM..z&...|
00000040 84 00 00 fa 00 00 00 80 e8 00 00 75 30 00 00 ea |...........u0...|
00000050 60 00 00 3a 98 00 00 17 70 9c ba 51 3c 00 00 00 |`..:....p..Q<...|
00000060 06 50 4c 54 45 ff 00 00 ff ff ff 41 1d 34 11 00 |.PLTE......A.4..|
00000070 00 00 01 62 4b 47 44 01 ff 02 2d de 00 00 00 07 |...bKGD...-.....|
00000080 74 49 4d 45 07 e5 01 0d 17 04 37 80 ef 04 02 00 |tIME......7.....|
00000090 00 00 0a 49 44 41 54 08 d7 63 60 00 00 00 02 00 |...IDAT..c`.....|
000000a0 01 e2 21 bc 33 00 00 00 25 74 45 58 74 64 61 74 |..!.3...%tEXtdat|
000000b0 65 3a 63 72 65 61 74 65 00 32 30 32 31 2d 30 31 |e:create.2021-01|
000000c0 2d 31 33 54 32 33 3a 30 34 3a 35 35 2b 30 30 3a |-13T23:04:55+00:|
000000d0 30 30 2d af d4 01 00 00 00 25 74 45 58 74 64 61 |00-......%tEXtda|
000000e0 74 65 3a 6d 6f 64 69 66 79 00 32 30 32 31 2d 30 |te:modify.2021-0|
000000f0 31 2d 31 33 54 32 33 3a 30 34 3a 35 35 2b 30 30 |1-13T23:04:55+00|
00000100 3a 30 30 5c f2 6c bd 00 00 00 00 49 45 4e 44 ae |:00\.l.....IEND.|
00000110 42 60 82 |B`.|
00000113
does not provide a clue, that I know of.
I understand that identifying the ColorSpace of an image, that does not contain that information, is a very hard problem -- see one proposed solution looking at the histogram of colors here.
So how identify, from the ImageMagick suite, determines the ColorSpace of this image?
It is common, but not standardized to assume that an image without an embedded or sidecar ICC profile or without an explicit encoding description is encoded according to IEC 61966-2-1:1999, i.e. sRGB specification.
This is just a bug in ImageMagick. You can use exiftool to check whether sRGB + intent chunk is present. In this case, no.
Gamma 2.2 is not sRGB. Thus ImageMagic is wrong here. That is a common problem on Wikipedia, all SVG images when converted to PNG have this and it destroys the colours. See: https://phabricator.wikimedia.org/T26768
We will have to reencode all images on Wikipedia, since we use ImageMagick. Sigh.
Background is that I have a log file that contains hex dumps that I want to convert with xxd to get that nice ASCII column that shows possible strings in the binary data.
The log file format looks like this:
My interesting hex dump:
00 53 00 6f 00 6d 00 65 00 20 00 74 00 65 00 78
00 74 00 20 00 65 00 78 00 61 00 6d 00 70 00 6c
00 65 00 20 00 75 00 73 00 69 00 6e 00 67 00 20
00 55 00 54 00 46 00 2d 00 31 00 36 00 20 00 69
00 6e 00 20 00 6f 00 72 00 64 00 65 00 72 00 20
00 74 00 6f 00 20 00 67 00 65 00 74 00 20 00 30
00 78 00 30 00 30 00 20 00 62 00 79 00 74 00 65
00 73 00 2e
Visually selecting the hex dump and do xxd -r -p followed by a xxd -g1 on the result does exactly what I'm aiming for.
However, since the number of dumps I want to convert are quite a few I would rather automate the process.
So I'm using the following substitute command to do the conversion:
:%s/\(\x\{2\} \?\)\{16\}\_.*/\=system('xxd -g1',system('xxd -r -p',submatch(0)))
The expression matches the entire hex dump in the log file. The match is sent to xxd -r -p as stdin and its output is used as stdin for xxd -g1.
Well, that's the idea at least.
The thing is that the above almost works. It produces the following result:
My interesting hex dump:
00000000: 01 53 01 6f 01 6d 01 65 01 20 01 74 01 65 01 78 .S.o.m.e. .t.e.x
00000010: 01 74 01 20 01 65 01 78 01 61 01 6d 01 70 01 6c .t. .e.x.a.m.p.l
00000020: 01 65 01 20 01 75 01 73 01 69 01 6e 01 67 01 20 .e. .u.s.i.n.g.
00000030: 01 55 01 54 01 46 01 2d 01 31 01 36 01 20 01 69 .U.T.F.-.1.6. .i
00000040: 01 6e 01 20 01 6f 01 72 01 64 01 65 01 72 01 20 .n. .o.r.d.e.r.
00000050: 01 74 01 6f 01 20 01 67 01 65 01 74 01 20 01 30 .t.o. .g.e.t. .0
00000060: 01 78 01 30 01 30 01 20 01 62 01 79 01 74 01 65 .x.0.0. .b.y.t.e
00000070: 01 73 01 2e .s..
All 00 bytes have mysteriously transformed into 01.
It should have produced the following:
My interesting hex dump:
00000000: 00 53 00 6f 00 6d 00 65 00 20 00 74 00 65 00 78 .S.o.m.e. .t.e.x
00000010: 00 74 00 20 00 65 00 78 00 61 00 6d 00 70 00 6c .t. .e.x.a.m.p.l
00000020: 00 65 00 20 00 75 00 73 00 69 00 6e 00 67 00 20 .e. .u.s.i.n.g.
00000030: 00 55 00 54 00 46 00 2d 00 31 00 36 00 20 00 69 .U.T.F.-.1.6. .i
00000040: 00 6e 00 20 00 6f 00 72 00 64 00 65 00 72 00 20 .n. .o.r.d.e.r.
00000050: 00 74 00 6f 00 20 00 67 00 65 00 74 00 20 00 30 .t.o. .g.e.t. .0
00000060: 00 78 00 30 00 30 00 20 00 62 00 79 00 74 00 65 .x.0.0. .b.y.t.e
00000070: 00 73 00 2e .s..
What am I not getting here?
Of course I can use macros and other ways of doing this, but I want to understand why my substitution command doesn't do what I expect.
Edit:
For anyone that want to achieve the same thing I provide the substitution expression that works on an entire file. The expression above was only for testing purposes using the log file example also from above.
The one below is the one that performs a correct conversion, modified based on the information Kent provided in his answer.
:%s/\(\(\x\{2\} \)\{16\}\_.\)\+/\=system('xxd -p -r | xxd -g1',submatch(0))
very likely, the problem is string conversion in the system() The input will be converted into a string by vim, so does the output of your first xxd command.
You can try to extract that hex parts into a file. then:
xxd -r -p theFile|vim -
And then calling the system('xxd -g1', alltext), you are gonna get something else than 00 too.
This doesn't work in the same way of a pipe (xxd ...|xxd...). But unfortunately, the system() function doesn't accept pipes.
If you want to fix your :s command, you need to call systemlist() on your first xxd call to get the data in binary format, then pass it to the 2nd xxd:
:%s/\(\x\{2\} \?\)\{16\}\_.*/\=system('xxd -g1',systemlist('xxd -r -p',submatch(0)))
The cmd above will generate the 00s. since there is no string conversion.
However, when working with some data format other than plain string, perhaps we can use filters instead of calling system(). It would be a lot eaiser. For your example:
2,$!xxd -r -p|xxd -g1
I'm developing a BitTorrent client and I'm having trouble getting answers to my piece requests.
To debug, I followed a conversation between uTorrent and transmission using Wireshark and tried to imitate same conversation in my client. But it still doesn't work.
Below is an example conversation happening between my client and transmission. (my client also using -TR--- prefixed peer id, this is only for testing purposes and I'll change this)
Indented messages are coming from transmission, others are messages my client send.
Note that this conversation is not exactly same as how uTorrent and transmission would talk, because my client does not support fast extension yet. (BEP 6)
(Output is taken from Wireshark, lines starting with -- are my comments)
00000000 13 42 69 74 54 6f 72 72 65 6e 74 20 70 72 6f 74 .BitTorr ent prot
00000010 6f 63 6f 6c 00 00 00 00 00 10 00 00 f8 9e 0d fd ocol.... ........
00000020 9c fc a8 52 d9 7a d6 af a4 4d 8f 73 ce 70 b6 36 ...R.z.. .M.s.p.6
00000030 2d 54 52 32 38 34 30 2d 36 68 61 67 76 30 73 70 -TR2840- 6hagv0sp
00000040 34 67 37 6b 4g7k
-- ^ my handshake to transmission
00000000 13 42 69 74 54 6f 72 72 65 6e 74 20 70 72 6f 74 .BitTorr ent prot
00000010 6f 63 6f 6c 00 00 00 00 00 10 00 04 f8 9e 0d fd ocol.... ........
00000020 9c fc a8 52 d9 7a d6 af a4 4d 8f 73 ce 70 b6 36 ...R.z.. .M.s.p.6
00000030 2d 54 52 32 38 34 30 2d 72 73 35 68 71 67 32 68 -TR2840- rs5hqg2h
00000040 6e 70 68 64 nphd
-- ^ transmission answers to my handshake
00000044 00 00 00 1a 14 00 64 31 3a 6d 64 31 31 3a 75 74 ......d1 :md11:ut
00000054 5f 6d 65 74 61 64 61 74 61 69 33 65 65 65 _metadat ai3eee
-- ^ my extended handshake to transmission
00000044 00 00 00 72 14 00 64 31 3a 65 69 31 65 31 3a 6d ...r..d1 :ei1e1:m
00000054 64 31 31 3a 75 74 5f 6d 65 74 61 64 61 74 61 69 d11:ut_m etadatai
00000064 33 65 65 31 33 3a 6d 65 74 61 64 61 74 61 5f 73 3ee13:me tadata_s
00000074 69 7a 65 69 31 34 37 65 31 3a 70 69 35 31 34 31 izei147e 1:pi5141
00000084 33 65 34 3a 72 65 71 71 69 35 31 32 65 31 31 3a 3e4:reqq i512e11:
00000094 75 70 6c 6f 61 64 5f 6f 6e 6c 79 69 31 65 31 3a upload_o nlyi1e1:
000000A4 76 31 37 3a 54 72 61 6e 73 6d 69 73 73 69 6f 6e v17:Tran smission
000000B4 20 32 2e 38 34 65 00 00 00 02 05 80 2.84e.. ....
-- ^ transmission's extended handshake and bitfield
000000C0 00 00 00 01 01 .....
-- ^ transmission unchokes me
00000062 00 00 00 01 02 .....
-- ^ my interested message
00000067 00 00 00 0d 06 00 00 00 00 00 00 00 00 00 00 40 ........ .......#
00000077 00 .
-- ^ piece request
-- no answers ...
00000078 00 00 00 0d 06 00 00 00 00 00 00 00 00 00 00 40 ........ .......#
00000088 00 .
-- ^ piece request again, with 10 seconds interval
-- again no answers...
00000089 00 00 00 0d 06 00 00 00 00 00 00 00 00 00 00 40 ........ .......#
00000099 00 .
-- ^ piece request again, with 10 seconds interval
-- no answers...
Any ideas what am I doing wrong?
Thanks.
EDIT: I updated my client to send unchoke just after sending interested, but I'm still having same problem...
The problem was that I was requesting a piece bigger than the total size of the torrent.
The torrent I was using has 2 files, in total of 12KB. However the piece size of the torrent is 16KB and I was requesting 16KB piece even though the torrent file has only one piece and it's 12 KB in total.
After requesting 12KB instead of 16KB, the problem was solved.
I'm having trouble with my home made "for fun" nameserver. It's been a couple of months since I updated it so I'm a bit rusty and thought I'd ask here and see if someone else sees what's wrong. I'm getting a FORMERR when asking for a TXT record, and the same problem occur on different domains, so there's probably something wrong in the packet formatting. Anyone?
dig txt ffffff.com #ns1.ffffff.com
;; Got bad packet: FORMERR
1024 bytes
ce bf 84 00 00 01 00 01 00 02 00 00 06 66 66 66 .............fff
66 66 66 03 63 6f 6d 00 00 10 00 01 c0 0c 00 10 fff.com.........
00 01 00 00 02 58 00 13 12 57 65 6c 63 6f 6d 65 .....X...Welcome
20 74 6f 20 66 66 66 66 66 66 66 00 c0 0c 00 02 .to.fffffff.....
00 01 00 00 02 58 00 10 03 6e 73 31 06 66 66 66 .....X...ns1.fff
66 66 66 03 63 6f 6d 00 c0 0c 00 02 00 01 00 00 fff.com.........
02 58 00 10 03 6e 73 32 06 66 66 66 66 66 66 03 .X...ns2.ffffff.
63 6f 6d 00 00 00 00 00 00 00 00 00 00 00 00 00 com.............
In the example above supplied, I added an incorrect 00 (null terminator) at the end of the TXT-string. After removing the null terminator from the TXT records, the txt records now work on my nameserver.