Reserved keywords count by programming language? - programming-languages

Is there a ranking or table of the number of reserved keywords in various programming languages?
I do not need to know the keywords per se but how many keywords languages like C, C++, C#, Perl, Python, PHP, Smalltalk, Lisp, and Ruby have.

Lists of keywords in ...
ANSI COBOL 85: 357
SystemVerilog: 250 + 73 reserved system functions = 323
VHDL 2008: 115 reserved words
C#: 79 + 23 contextual = 102
F#: 64 + 8 from ocaml + 26 future = 98
C++: 82
Dart: 54
Java: 50 (48 without unused keywords const and goto)
PHP: 49
Ruby 42
JavaScript: 38 reserved words + 8 words reserved in strict mode only
Python 3.7: 35
C: 32
Python 2.7: 31
Go: 25
Elm : 25
Lua: 22
CoffeeScript: 19, not
necessarily "reserved", plus ~50 to avoid from JS
Smalltalk: 6 pseudo-variables
iota: 2

Related

Is there any method to identify the sentiment analysis on students marks

I need to identify the sentiments for particular students with particular subjects in the form of:
st_id name subject 1st_Sem 2nd_sem 3rd_Sem 4th_Sem sentiment/output
1894 John English 88 90 64 58 Positive
1894 John Maths 64 30 23 31 Negative
Is there any method in machine learning to identify the sentiment using marksheet

I've had some issue testing SHA256

When using SHA256 it always returned me characters contained in the standard english alphabet(lowercase) and arabic numbers(0-9). So the character set returned was [a-z]U[0-9]
The reason this confuses me is that I've heard a SHA256 should have 2^256 different results, since each bit is "random" each byte should be represented by a completely random ASCII character, not one that fits into a restricted set of 36 characters(26 letters and 10 numerals)
Basically, I want to know if my SHA256 is behaving properly and if it is, why is it like this. I am using the standard sha256sum function that comes with linux.
Yes, your assumption is correct. SHA256 will generate a total of 32 bytes (= 256 bits), each byte having an arbitrary value between 0 and 255 (inclusive).
But here lies the problem, most of those bytes do not represent valid ASCII characters (only 0 - 127) and some of them are invisible (space, tab, linefeed, and several control characters).
To "render" the SHA256, the bytes are encoded in hexadecimal format. A single byte is represented by 2 characters. 00 = 0, 7f = 127, ff = 255.
The SHA256 hash of the empty string is e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, or if each byte is converted to decimal:
227 176 196 66 152 252 28 20 154 251 244 200 153 111 185 36 39 174 65 228 100 155 147 76 164 149 153 27 120 82 184 85

Where to place the return statement when defining a function to read in a file using with open(...) as ...?

I have a text file consisting of data that is separated by tab-delimited columns. There are many ways to read data in from the file into python, but I am specifically trying to use a method similar to one outlined below. When using a context manager like with open(...) as ..., I've seen that the general concept is to have all of the subsequent code indented within the with statement. Yet when defining a function, the return statement is usually placed at the same indentation as the first line of code within the function (excluding cases with awkward if-else loops). In this case, both approaches work. Is one method considered correct or generally preferred over the other?
def read_in(fpath, contents=[], row_limit=np.inf):
"""
fpath is filelocation + filename + '.txt'
contents is the initial data that the file data will be appeneded to
row_limit is the maximum number of rows to be read (in case one would like to not read in every row).
"""
nrows = 0
with open(fpath, 'r') as f:
for row in f:
if nrows < row_limit:
contents.append(row.split())
nrows += 1
else:
break
# return contents
return contents
Below is a snippet of the text-file I am using for this example.
1996 02 08 05 17 49 263 70 184 247 126 0 -6.0 1.6e+14 2.7e+28 249
1996 02 12 05 47 26 91 53 160 100 211 236 2.0 1.3e+15 1.6e+29 92
1996 02 17 02 06 31 279 73 317 257 378 532 9.9 3.3e+14 1.6e+29 274
1996 02 17 05 18 59 86 36 171 64 279 819 27.9 NaN NaN 88
1996 02 19 05 15 48 98 30 266 129 403 946 36.7 NaN NaN 94
1996 03 02 04 11 53 88 36 108 95 120 177 1.0 1.5e+14 8.7e+27 86
1996 03 03 04 12 30 99 26 186 141 232 215 2.3 1.6e+14 2.8e+28 99
And below is a sample call.
fpath = "/Users/.../sample_data.txt"
data_in = read_in(fpath)
for i in range(len(data_in)):
print(data_in[i])
(I realize that it's better to use chunks of pre-defined sizes to read in data, but the number of characters per row of data varies. So I'm instead trying to give user control over the number of rows read in; one could read in a subset of the rows at a time and append them into contents, continually passing them into read_in - possibly in a loop - if the file size is large enough. That said, I'd love to know if I'm wrong about this approach as well, though this isn't my main question.)
If your function needs to do some other things after writing to the file, you usually do it outside the with block. So essentially you need to return outside the with block too.
However if the purpose of your function is just to read in a file, you can return within the with block, or outside it. I believe none of the methods are preferred in this case.
I don't really understand your second question.
You can put return also withing with context.
By exiting context, the cleanup are done. This is the power of with, not to need to check all possible exit paths. Note: also with exception inside with the exit context is called.
But if file is empty (as an example), you should still return something. So in such case your code is clear, and follow the principle: one exit path. But if you should handle end of file without finding something important, I would putting normal return within with context, and handle the special case after it.

Understanding the zlib header; CMF (CM, CINFO), FLG, (FDICT/DICTID, FLEVEL); RFC1950 § 2.2. Data format

I am curious about the zlib data format and trying to understand the zlib header as described in RFC1950 (https://www.rfc-editor.org/rfc/rfc1950). I am however new to this kind of low level interpretation and seem to have run afoul with some of my conclusions.
I have the following compressed data (from a PDF stream object):
b'h\xdebbd\x10`b`Rcb`\xb0ab`\xdc\x0b\xa4\x93\x98\x18\xfe>\x06\xb2\xed\x01\x02\x0c\x00!\xa4\x03\xc4'
In python, I have successfully decompressed and re-compressed the data:
b'x\xdacbd\x10`b`Rcb`\xb0ab`\xdc\x0b\xa4\x93\x98\x18\xfe>\x06\xb2\xed\x01!\xa4\x03\xc4'
As I have understood the discussion/answer in Deflate and inflate for PDF, using zlib C++
The difference in result of the compressed data should not matter as it is an effect of different applied methods to compress the data.
Assuming the last four bytes !\xa4\x03\xc4 are the ADLER32 (Adler-32 checksum) my questions pertain to the first 2 bytes.
0 1 0 1 2 3 0 1 2 3
+---+---+ +---+---+---+---+ +=====================+ +---+---+---+---+
|CMF|FLG| | [DICTID] | |...compressed data...| | ADLER32 |
+---+---+ +---+---+---+---+ +=====================+ +---+---+---+---+
CMF
The first byte represents the CMF, which in my two instances would be
chr h = dec 104 = hex 68 = 01101000
and chr x = dec 120 = hex 78 = 01111000
This byte is divided into a 4-bit compression method and a 4-bit information field depending on the compression method.
bits 0 to 3 CM Compression method
bits 4 to 7 CINFO Compression info
+----|----+ +----|----+ +----|----+
|0000|0000| i.e. |0110|1000| and |0111|1000|
+----|----+ +----|----+ +----|----+
CM |CINFO CM |CINFO CM |CINFO
Where
[CM] identifies the compression method used in the file.
CM = 8 denotes the "deflate" compression method with a window size up to >32K. This is the method used by gzip and PNG (see
CM = 15 is reserved.
and
For CM = 8, CINFO is the base-2 logarithm of the LZ77 window size, minus eight (CINFO=7 indicates a 32K window size). Values of CINFO above 7 are not allowed in this version of the specification. CINFO is not defined in this specification for CM not equal to 8.
As I understand it,
the only valid CM is 8
CINFO can be 0-7
Cf https://stackoverflow.com/a/34926305/7742349
You should NOT assume that it's always 8. Instead, you should check it and, if it's not 8, throw a "not supported" error.
Cf https://groups.google.com/forum/#!msg/comp.compression/_y2Wwn_Vq_E/EymIVcQ52cEJ
An exhaustive list of all 64 current possibilities for zlib headers:
COMMON
78 01
78 5e
78 9c
78 da
RARE
08 1d 18 19 28 15 38 11 48 0d 58 09 68 05
08 5b 18 57 28 53 38 4f 48 4b 58 47 68 43
08 99 18 95 28 91 38 8d 48 89 58 85 68 81
08 d7 18 d3 28 cf 38 cb 48 c7 58 c3 68 de
VERY RARE
08 3c 18 38 28 34 38 30 48 2c 58 28 68 24 78 3f
08 7a 18 76 28 72 38 6e 48 6a 58 66 68 62 78 7d
08 b8 18 b4 28 b0 38 ac 48 a8 58 a4 68 bf 78 bb
08 f6 18 f2 28 ee 38 ea 48 e6 58 e2 68 fd 78 f9
Q1 My first question is simply
Why is the CINFO before the CM?, i.e.,
why is it not 87, 80, 81, 82, 83, ...
As far as I know, byte order is not an issue here. I suspect it may be related to the least significant bit (RFC1950 § 2.1. Overall conventions), but I cannot quite understand how it would result in, e.g., 78 instead of 87...
Q2 My second question
If CINFO 7 represents "a window size up to 32K", then what does 1-6 correspond to? (assuming 0 means window size 0, as in, no compression applied).
FLG
The second byte represents the FLG
\xde -> 11011110
\xda -> 11011010
[FLG] [...] is divided as follows:
bits 0 to 4 FCHECK (check bits for CMF and FLG)
bit 5 FDICT (preset dictionary)
bits 6 to 7 FLEVEL (compression level)
+-----|-|--+ +-----|-|--+ +-----|-|--+
|00000|0|00| i.e. |11011|1|10| and |11011|0|10|
+-----|-|--+ +-----|-|--+ +-----|-|--+
C |D| L C |D| L C |D| L
Bit 0-4 as far as I can tell is some form of "checksum" or integrity control?
Bit 5 indicate whether a dictionary is present.
FDICT (Preset dictionary)
If FDICT is set, a DICT dictionary identifier is present immediately after the FLG byte. The dictionary is a sequence of bytes which are initially fed to the compressor without producing any compressed output. DICT is the Adler-32 checksum of this sequence of bytes (see the definition of ADLER32 below). The decompressor can use this identifier to determine which dictionary has been used by the compressor.
Q3 My third question
Assuming that "1" indicates "is set"
\xde -> 11011_1_10
\xda -> 11011_0_10
According to the specification DICTID consist of 4 bytes. The four following bytes in the compressed streams I have are
bbd\x10
cbd\x10
Why are the compressed data from the PDF stream object (with the FDICT 1) and the compressed data with python zlib (with the FDICT 0) almost identical?
Granted that I do not understand the function of the DICTID, but is it not supposed to exist only if FDICT is set?
Q4 My fourth question
Bit 6-7 sets the FLEVEL (Compression level)
These flags are available for use by specific compression methods. The "deflate" method (CM = 8) sets these flags as follows:
0 - compressor used fastest algorithm
1 - compressor used fast algorithm
2 - compressor used default algorithm
3 - compressor used maximum compression, slowest algorithm
The information in FLEVEL is not needed for decompression; it is there to indicate if recompression might be worthwhile.
I would have thought that the flags would be:
0 (00)
1 (01)
2 (10)
3 (11)
However from the What does a zlib header look like?
01 (00000001) - No Compression/low
[5e (01011100) - Default Compression?]
9c (10011100) - Default Compression
da (11011010) - Best Compression
I note however that the two left-most bits seem to correspond to what I have expected; I feel am obviously failing to comprehend something fundamental in how to interpret bits...
The RFC says:
CMF (Compression Method and flags)
This byte is divided into a 4-bit compression method and a 4-
bit information field depending on the compression method.
bits 0 to 3 CM Compression method
bits 4 to 7 CINFO Compression info
The least significant bit of a byte is bit 0. The most significant bit is bit 7. So the diagram you made for mapping CM and CINFO to bits is backwards. 0x78 and 0x68 both have a CM of 8. Their CINFO's are 7 and 6 respectively.
CINFO is what the RFC says it is:
CINFO (Compression info)
For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
size, minus eight (CINFO=7 indicates a 32K window size).
So, a CINFO of 7 means a 32 KiB window. 6 means a 16 KiB. CINFO == 0 does not mean no compression. It means a window size of 256 bytes.
For the flag byte, you got it backwards again. FDICT is not set. For both of your examples, the compression level is 11, maximum compression.

How does Excel evaluate FACT(170)/FACT(169) correctly?

170! approaches the limit of a floating point double: 171! will overflow.
However 170! is over 300 digits long.
There is, therefore, no way that 170! can be represented precisely in floating point.
Yet Excel returns the correct answer for 170! / 169!.
Why is this? I'd expect some error to creep in, but it returns an integral value. Does Excel somehow know how to optimise this calculation?
If you find the closest doubles to 170! and 169!, they are
double oneseventy = 5818033100654137.0 * 256;
double onesixtynine = 8761273375102700.0;
times the same power of two. The closest double to the quotient of these is exactly 170.0.
Also, Excel may compute 170! by multiplying 169! by 170.
William Kahan has a paper called "How Futile are Mindless Assessments of Roundoff in Floating-Point Computation?" where he discusses some of the insanity that goes on in Excel. It may be that Excel is not computing 170 exactly, but rather it's hiding an ulp of reality from you.
The answer of tmyklebu is already perfect. But I wanted to know more.
What if implementation of n! was something trivial as return double(n)*(n-1)!...
Here is a Smalltalk snippet, but you can translate in many other languages, that's not the point:
(2 to: 170) count: [:n |
| num den |
den := (2 to: n - 1) inject: 1.0 into: [:p :e | p*e].
num := n*den.
num / den ~= n].
And the answer is 12
So you have not been particulary lucky, due to good properties of round to nearest even rounding mode, out of these 169 numbers, only 12 don't behave as expected.
Which ones? Replace count: by select: and you get:
#(24 47 59 61 81 96 101 104 105 114 122 146)
If I had an Excel handy, I would ask to evaluate 146!/145!.
Curiously (only apparently curiously), a less naive solution that computes the exact factorial with large integer arithmetic, then convert to nearest float, does not perform better !
(2 to: 170) reject: [:n |
n factorial asFloat / (n-1) factorial asFloat = n]
leads to:
#(24 31 34 40 41 45 46 57 61 70 75 78 79 86 88 92 93 111 115 116 117 119 122 124 141 144 147 164)

Resources