I realize this is similar to other questions that have been asked, but after pouring over stack overflow and node.js documentation for several days I decided it was time to ask for help.
Basically what I need to do is read data from a binary file and extract information to then convert into plain text.
I have been trying to use buffers but I am not seeing any progress.
If someone could get me to a point where you can read the entire file, and decode into plain text I believe I can take it from there.
const fs = require('fs');
let buf = fs.readFileSync(__dirname + '/sd_bl.bin');
let buffer = new Buffer.alloc(Buffer.byteLength(buf));
buffer.write(buf.toString('base64'));
console.log(buffer);
This is what I currently have, and the output is:
<Buffer 41 41 45 43 41 77 51 46 42 67 63 41 41 51 49 44 42 41 55 47 42 77 41 42 41 67 4d 45 42 51 59 48 41 41 45 43 41 77 51 46 42 67 63 41 41 51 49 44 42 41 ... 7950 more bytes>
So it seems to be allocating and writing to the buffer correctly, but I don't know what the next steps should be as far as getting plain text out of it.
I am relatively new here, but here's a shot! I believe your console.log statement may be misleading you.
Console-logging buffer.toString() instead, should be sufficient to get you the human-readable contents of the file.
Here is an example:
var buffer = fs.readFileSync('yourfile.txt')
console.log(buffer.toString())
Currently, your code creates a new buffer object (buffer) after reading the file, writes the contents of the old buffer to it and then prints out the new Buffer object itself instead of its string representation.
Edit: Screenshot of this in Terminal
For more on actually reading the file line-by-line, I would check this other so post out: Read a file one line at a time in node.js?
Related
I'm writing a bingo game in python. So far I can generate a bingo card and print it.
My problem is after I've randomly generated a number to call out, I don't know how to 'cross out' that number on the card to note that it's been called out.
This is the ouput, it's a randomly generated card:
B 11 13 14 2 1
I 23 28 26 27 22
N 42 45 40 33 44
G 57 48 59 56 55
O 66 62 75 63 67
I was thinking to use random.pop to generate a number to call out (in bingo the numbers go from 1 to 75)
random_draw_list = random.sample(range(1, 76), 75)
number_drawn = random_draw_list.pop()
How can I write a funtion that will 'cross out' a number on the card after its been called.
So for example if number_drawn results in 11, it should replace 11 on the card with an x or a zero.
I am working on a shell script that will execute mongoexport and upload it to a S3 bucket.
The goal is to extract date as a readable JSON format on data that is 45 days old.The script will run everyday as a crontab.
So basically the purpose is to archive data older than 45 days
Normal queries work intended but when I try to use variables it results an error.
The code regular format is as the following:
firstdate="$(date -v-46d +%Y-%m-%d)"
afterdate="$(date -v-45d +%Y-%m-%d)"
backup_name=gamebook
colname=test1
mongoexport --uri mongodb+srv://<user>:<pass>#gamebookserver.tvdmx.mongodb.net/$dbname
--collection $colname --query '{"gameDate": {"$gte": {"$date": "2020-09-04T00:00:00:000Z"}, "$lte": {"$date": "2020-09-05T00:00:00.000Z"}}}' --out $backup_name;
The previous code works but I want to make it more dynamic in the dates so I tried the code as shown below:
firstdate="$(date -v-46d +%Y-%m-%d)"
afterdate="$(date -v-45d +%Y-%m-%d)"
backup_name=gamebook
colname=test1
mongoexport --uri mongodb+srv://<user>:<pass>#gamebookserver.tvdmx.mongodb.net/$dbname
--collection $colname --query '{"gameDate": {"$gte": {"$date": "$firstdateT00:00:00:000Z"}, "$lte": {"$date": "$afterdateT00:00:00.000Z"}}}' --out $backup_name;
This results in the error:
2020-10-20T15:36:13.881+0700 query '[123 34 103 97 109 101 68 97 116 101 34 58 32 123 34 36 103 116 101 34 58 32 123 34 36 100 97 116 101 34 58 32 36 102 105 114 115 116 100 97 116 101 84 48 48 58 48 48 58 48 48 58 48 48 48 90 125 44 32 34 36 108 116 101 34 58 32 123 34 36 100 97 116 101 34 58 32 34 36 97 102 116 101 114 100 97 116 101 84 48 48 58 48 48 58 48 48 46 48 48 48 90 34 125 125 125]' is not valid JSON: invalid character '$' looking for beginning of value
2020-10-20T15:36:13.881+0700 try 'mongoexport --help' for more information
I've read in the documentation and it says:
You must enclose the query document in single quotes ('{ ... }') to ensure that it does not interact with your shell environment.
So my overall question is that is there a way to use values in the shell environment and parse them into the query section?
Or is there a better way that might get me the same result?
I'm still new to mongodb in general so any advise would be great.
You can always put together a string by combining interpolating and non-interpolating parts:
For instance,
--query '{"gameDate": {"$gte": {"'"$date"'": "'"$firstdate"'T00:00:00:000Z"}, "$lte": {"$date": "$afterdateT00:00:00.000Z"}}}'
would interpolate the first occurance of date and the shell variable firstdate, but would passs the rest literally to mongoexport (I've picked two variables for demonstration, because I don't understand from your question, which ones you want to expand and which one you don't want to). Basically, a
'$AAAA'"$BBBB"'$CCCCC'
is in effect a single string, but the $BBBB part would undergo parameter expansion. Hence, if
BBBB=foo
you would get the literal string $AAAAfoo$CCCCC out of this.
Since this become tedious to work, an alternative approach is to enclose everything into double-quotes, which means all parameters are expanded, and manually escape those parts which you don't want to expand. You could write the last example also as
"\$AAAA$BBBB\$CCCCC"
I want to make a output file which is simply the input file with the value of each byte incremented by one.
here is the expected output:
04 fb 56 13 21 67 68 51 e9 ac
which also will be in hexadecimal notation. I am trying to do that in python3 using the following command:
I have a text file consisting of data that is separated by tab-delimited columns. There are many ways to read data in from the file into python, but I am specifically trying to use a method similar to one outlined below. When using a context manager like with open(...) as ..., I've seen that the general concept is to have all of the subsequent code indented within the with statement. Yet when defining a function, the return statement is usually placed at the same indentation as the first line of code within the function (excluding cases with awkward if-else loops). In this case, both approaches work. Is one method considered correct or generally preferred over the other?
def read_in(fpath, contents=[], row_limit=np.inf):
"""
fpath is filelocation + filename + '.txt'
contents is the initial data that the file data will be appeneded to
row_limit is the maximum number of rows to be read (in case one would like to not read in every row).
"""
nrows = 0
with open(fpath, 'r') as f:
for row in f:
if nrows < row_limit:
contents.append(row.split())
nrows += 1
else:
break
# return contents
return contents
Below is a snippet of the text-file I am using for this example.
1996 02 08 05 17 49 263 70 184 247 126 0 -6.0 1.6e+14 2.7e+28 249
1996 02 12 05 47 26 91 53 160 100 211 236 2.0 1.3e+15 1.6e+29 92
1996 02 17 02 06 31 279 73 317 257 378 532 9.9 3.3e+14 1.6e+29 274
1996 02 17 05 18 59 86 36 171 64 279 819 27.9 NaN NaN 88
1996 02 19 05 15 48 98 30 266 129 403 946 36.7 NaN NaN 94
1996 03 02 04 11 53 88 36 108 95 120 177 1.0 1.5e+14 8.7e+27 86
1996 03 03 04 12 30 99 26 186 141 232 215 2.3 1.6e+14 2.8e+28 99
And below is a sample call.
fpath = "/Users/.../sample_data.txt"
data_in = read_in(fpath)
for i in range(len(data_in)):
print(data_in[i])
(I realize that it's better to use chunks of pre-defined sizes to read in data, but the number of characters per row of data varies. So I'm instead trying to give user control over the number of rows read in; one could read in a subset of the rows at a time and append them into contents, continually passing them into read_in - possibly in a loop - if the file size is large enough. That said, I'd love to know if I'm wrong about this approach as well, though this isn't my main question.)
If your function needs to do some other things after writing to the file, you usually do it outside the with block. So essentially you need to return outside the with block too.
However if the purpose of your function is just to read in a file, you can return within the with block, or outside it. I believe none of the methods are preferred in this case.
I don't really understand your second question.
You can put return also withing with context.
By exiting context, the cleanup are done. This is the power of with, not to need to check all possible exit paths. Note: also with exception inside with the exit context is called.
But if file is empty (as an example), you should still return something. So in such case your code is clear, and follow the principle: one exit path. But if you should handle end of file without finding something important, I would putting normal return within with context, and handle the special case after it.
I have a 100M row file that has some encoding problems -- was "originally" EBCDIC, saved as US-ASCII, now UTF-8. I don't know much more about its heritage, sorry -- I've just been asked to analyze the content.
The "cents" character from EBCDIC is "hidden" in this file in random places, causing all sorts of errors. Here is more on this bugger: cents character in hex
Converting this file using iconv -f foo -t UTF-8 -c is not working -- the cents character prevails.
When I use hex editor, I can find the appearance of 0xC2 0xA2 (c2a2). But in a BIG file, this isn't ideal. Sed doesn't work at hex level, so... Not sure about tr -- I only really use it for carriage return / new line.
What linux utility / command can I use to find and delete this character reasonably quickly on very big files?
2 parts:
1 -- utility / command to find / count the number of these occurrences (octal \242)
2 -- command to replace (this works tr '\242' ' ' < source > output )
How the text appears on my ubuntu terminal:
1019EQ?IT DEPT GENERATED
With xxd, how it looks at hex level (ascii to the side looks the same as above):
0000000: 3130 3139 4551 a249 5420 4445 5054 2047 454e 4552 4154 4544 0d0a
With xxd, how it looks with "show ebcdic" -- here, just showing the ebcdic from side:
......s.....&....+........
So hex "a2" is the culprit. I'm now trying xxd -E foo | grep a2 to count the instances up.
Adding output from od -ctxl, rather than xxd, for those interested:
0000000 1 0 1 9 E Q 242 I T D E P T G
31 30 31 39 45 51 a2 49 54 20 44 45 50 54 20 47
0000020 E N E R A T E D \r \n
45 4e 45 52 41 54 45 44 0d 0a
When you say the file was converted what do you mean? Do you mean the binary file was simply dumped from an IBM 360 to another ASCII based computer, or was the file itself converted over to ASCII when it was transferred over?
The question is whether the file is actually in a well encoded state or not. The other question is how do you want the file encoded?
On my Mac (which uses UTF-8 by default, just like Linux systems), I have no problem using sed to get rid of the ¢ character:
Here's my file:
$ cat test.txt
This is a test --¢-- TEST TEST
$ od -ctx1 test.txt
0000000 T h i s i s a t e s t -
54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 2d
0000020 - ¢ ** - - T E S T T E S T \n
2d c2 a2 2d 2d 20 54 45 53 54 20 54 45 53 54 0a
0000040
You can see that cat has no problems printing out that ¢ character. And, you can see in the od dump the c2a2 encoding of the ¢ character.
$ sed 's/¢/$/g' test.txt > new_test.txt
$ cat new_test.txt
This is a test --$-- TEST TEST
$ od -ctx1 new_test.txt
0000000 T h i s i s a t e s t -
54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 2d
0000020 - $ - - T E S T T E S T \n
2d 24 2d 2d 20 54 45 53 54 20 54 45 53 54 0a
0000037
Here's my sed has no problems changing that ¢ into a $ sign. The dump now shows that this test file is equivalent to a strictly ASCII encoded file. That two hexadecimal digit encoded ¢ is now a nice clean single hexadecimal digit encoded $.
It looks like sed can handle your issue.
If you want to use this file on a Windows system, you can convert the file to the standard Windows Code Page 1252:
$ iconv -f utf8 -t cp1252 test.txt > new_test.txt
$ cat new_test.txt
This is a test --?-- TEST TEST
$ od -ctx1 new_test.txt
0000000 T h i s i s a t e s t -
54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 2d
0000020 - 242 - - T E S T T E S T \n
2d a2 2d 2d 20 54 45 53 54 20 54 45 53 54 0a
0000037
Here's the file now in Codepage 1252 just like the way Windows likes it! Note that the ¢ is now a nice hex 242 character.
So, what is exactly the issue? Do you need to file in pure ASCII defined 127 characters? Do you need the file encoded, so Windows machines can work on it? Are you having problems entering the ¢ character?
Let me know. I'm not from the government, and yet I'm here to help you.