mDDR chips used in Beagleboard xM

mDDR chips used in Beagleboard xM - linux

I'm analysing the X-Loader settings for the POP mDDR on the Beagleboard xM.
The amount of mDDR POP memory in the BB xM is 512MB (according to the Manual).
More precisely the Micron variant: 256MB on CS0 + 256MB on CS1 = 512MB total.
The bus width is 32 bits, this can be verified in the SDRC_MCFG_p register settings in the X-Loader.
The type of memory used is the MT46H128M32L2KQ-5 as mentioned in this group:
https://groups.google.com/forum/#!topic/beagleboard/vgrq2bOxXrE
Reading the data sheet of that memory, the 32 bit configuration with the maximum capacity is 16Meg x 32 x 4 = 64Meg x 32.
So 64MB are not 256MB, 128 MB are feasible but only with 16 bit bus width, and even then, we are still not at 256MB.
The guy in the group mentioned above says that the memory is a 4Gb, but the data sheet says that it is a 2Gb.
My question:
How can 512MB be achieved by using 2 memory chips of the above type and 32 bit bus width?
Thanks in advance for your help.
Martin

According to datasheet MT46H128M32L2KQ-5 has following configuration:
MT46H128M32L2 – 16 Meg x 32 x 4 Banks x 2
16 Meg x 32 x 4 Banks x 2 = 4096 Meg (bits, not bytes)
4096 Meg (bits) / 8 = 512 MB (Megabytes)
More from datasheet:
The 2Gb Mobile low-power DDR SDRAM is a high-speed CMOS, dynamic
random-access memory containing 2,147,483,648 bits.
Each of the x32’s 536,870,912-bit banks is organized as 16,384 rows by 1024
columns by 32 bits. (p. 8)
So, if you multiply the number of rows by the number of columns by the number of bits (it's specified in the datasheet), you will get the size of a bank in bits. Bank size is = 16384 x 1024 x 32 = 16 Megs x 32 = 536870912 (bits).
Next, you need to multiply the bank size (in bits) by the number of banks in chip: chip size = 536870912 x 4 = 2147483648 (bits).
In order to get result in bytes, you have to dived it by 8.
chip size (bytes) = 2147483648 (bits) / 8 = 268435456
In order to get result in megabytes, you have to dived it by 1024 x 1024
chip size = 268435456 / 1024 / 1024 = 256 MB (Megabytes)
This is dual LPDDR chip internally organized as 2 x 256 MB chips (it has two chip selects: CS0#, CS1#) (it's specified in the datasheet). The single chip contains two memory chips inside, 256MB each. For BB this single chip must be configured like 2 memories 256MB each in order to get 512MB. So, you have to setup CS0 as 256MB and CS1 as 256MB.

Related

My matrix multiplication program takes quadruple time when thread count doubles

I wrote this simple program that multiplies matrices. I can specify how
many OS threads are used to run it with the environment variable
OMP_NUM_THREADS. It slows down a lot when the thread count gets
larger than my CPU's physical threads.
Here's the program.
static double a[DIMENSION][DIMENSION], b[DIMENSION][DIMENSION],
c[DIMENSION][DIMENSION];
#pragma omp parallel for schedule(static)
for (unsigned i = 0; i < DIMENSION; i++)
for (unsigned j = 0; j < DIMENSION; j++)
for (unsigned k = 0; k < DIMENSION; k++)
c[i][k] += a[i][j] * b[j][k];
My CPU is i7-8750H. It has 12 threads. When the matrices are large
enough, the program is fastest on around 11 threads. It is 4 times as
slow when the thread count reaches 17. Then run time stays about the
same as I increase the thread count.
Here's the results. The top row is DIMENSION. The left column is the
thread count. Times are in seconds. The column with * is when
compiled with -fno-loop-unroll-and-jam.
1024 2048 4096 4096* 8192
1 0.2473 3.39 33.80 35.94 272.39
2 0.1253 2.22 18.35 18.88 141.23
3 0.0891 1.50 12.64 13.41 100.31
4 0.0733 1.13 10.34 10.70 82.73
5 0.0641 0.95 8.20 8.90 62.57
6 0.0581 0.81 6.97 8.05 53.73
7 0.0497 0.70 6.11 7.03 95.39
8 0.0426 0.63 5.28 6.79 81.27
9 0.0390 0.56 4.67 6.10 77.27
10 0.0368 0.52 4.49 5.13 55.49
11 0.0389 0.48 4.40 4.70 60.63
12 0.0406 0.49 6.25 5.94 68.75
13 0.0504 0.63 6.81 8.06 114.53
14 0.0521 0.63 9.17 10.89 170.46
15 0.0505 0.68 11.46 14.08 230.30
16 0.0488 0.70 13.03 20.06 241.15
17 0.0469 0.75 20.67 20.97 245.84
18 0.0462 0.79 21.82 22.86 247.29
19 0.0465 0.68 24.04 22.91 249.92
20 0.0467 0.74 23.65 23.34 247.39
21 0.0458 1.01 22.93 24.93 248.62
22 0.0453 0.80 23.11 25.71 251.22
23 0.0451 1.16 20.24 25.35 255.27
24 0.0443 1.16 25.58 26.32 253.47
25 0.0463 1.05 26.04 25.74 255.05
26 0.0470 1.31 27.76 26.87 253.86
27 0.0461 1.52 28.69 26.74 256.55
28 0.0454 1.15 28.47 26.75 256.23
29 0.0456 1.27 27.05 26.52 256.95
30 0.0452 1.46 28.86 26.45 258.95
Code inside the loop compiles to this on gcc 9.3.1 with
-O3 -march=native -fopenmp. rax starts from 0 and increases by 64
each iteration. rdx points to c[i]. rsi points to b[j]. rdi
points to b[j+1].
vmovapd (%rsi,%rax), %ymm1
vmovapd 32(%rsi,%rax), %ymm0
vfmadd213pd (%rdx,%rax), %ymm3, %ymm1
vfmadd213pd 32(%rdx,%rax), %ymm3, %ymm0
vfmadd231pd (%rdi,%rax), %ymm2, %ymm1
vfmadd231pd 32(%rdi,%rax), %ymm2, %ymm0
vmovapd %ymm1, (%rdx,%rax)
vmovapd %ymm0, 32(%rdx,%rax)
I wonder why the run time increases so much when the thread count
increases.
My estimate says this shouldn't be the case when DIMENSION is 4096.
What I thought before I remembered that the compiler does 2 j loops at
a time. Each iteration of the j loop needs rows c[i] and b[j].
They are 64KB in total. My CPU has a 32KB L1 data cache and a 256KB L2
cache per 2 threads. The four rows the two hardware threads are working
with don't fit in L1 but fit in L2. So when j advances, c[i] is
read from L2. When the program is run on 24 OS threads, the number of
involuntary context switches is around 29371. Each thread gets
interrupted before it has a chance to finish one iteration of the j
loop. Since 8 matrix rows can fit in the L2 cache, the other software
thread's 2 rows are probably still in L2 when it resumes. So the
execution time shouldn't be much different from the 12 thread case.
However measurements say it's 4 times as slow.
Now that I have realized 2 j loops are done at a time. This way each
j iteration works on 96KB of memory. So 4 of them can't fit in the
256KB L2 cache. To verify this is what slows the program down, I
compiled the program with -fno-loop-unroll-and-jam. I got
vmovapd ymm0, YMMWORD PTR [rcx+rax]
vfmadd213pd ymm0, ymm1, YMMWORD PTR [rdx+rax]
vmovapd YMMWORD PTR [rdx+rax], ymm0
The results are in the table. They are like when 2 rows are done at a
time. Which makes me wonder even more. When DIMENSION is 4096, 4
software threads' 8 rows fit in the L2 cache when each thread works on 1
row at a time, but 12 rows don't fit in the L2 cache when each thread
works on 2 rows at a time. Why are the run times similar?
I thought maybe it's because the CPU warmed up when running with less
threads and has to slow down. I ran the tests multiple times, both in
the order of increasing thread count and decreasing thread count. They
yield similar results. And dmesg doesn't contain anything related to
thermal or clock.
I tried separately changing 4096 columns to 4104 columns and setting
OMP_PROC_BIND=true OMP_PLACES=cores, and the results are similar.

This problem seems to come from either the CPU caches (due to the bad memory locality) or the OS scheduler (due to more threads than the hardware can simultaneously execute).
I cannot exactly reproduce the same effect on my i5-9600KF processor (with 6 cores and 6 threads) and with a matrix of size 4096x4096. However, similar effects occur.
Here are performance results (with GCC 9.3 using -O3 -march=native -fopenmp on Linux 5.6):
#threads | time (in seconds)
----------------------------
1 | 16.726885
2 | 9.062372
3 | 6.397651
4 | 5.494580
5 | 4.054391
6 | 5.724844 <-- maximum number of hardware threads
7 | 6.113844
8 | 7.351382
9 | 8.992128
10 | 10.789389
11 | 10.993626
12 | 11.099117
24 | 11.283873
48 | 11.412288
We can see that the computation time starts to significantly grow between 5 and 12 cores.
This problem is due to a lot more data fetched from the RAM. Indeed, 161.6 Gio are loaded from memory with 6 threads while 424.7 Gio are loaded with 12 threads! In both cases, 3.3 Gio are written to the RAM. Because my memory throughput is roughly 40 Gio/s, the RAM accesses represent more than 96% of the overall execution time with 12 threads!
If we dig deeper, we can see that the number of L1 cache references and L1 cache misses are the same whatever the number of threads used. Meanwhile, there are a lot more L3 cache misses (as well as more references). Here are L3-cache statistics:
With 6 threads: 4.4 G loads
1.1 G load-misses (25% of all LL-cache hits)
With 12 threads: 6.1 G loads
4.5 G load-misses (74% of all LL-cache hits)
This means that the locality of the memory access is clearly worse with more threads. I guess this is because the compiler is not clever enough to do high-level cache-based optimizations that could reduce RAM pressure (especially when the number of threads is high). You have to do tiling yourself in order to improve the memory locality. You can find a good guide here.
Finally, note that using more threads that the hardware can simultaneously execute is generally not efficient. One problem is that the OS scheduler often badly place threads to core and frequently move them. The usual way to fix that is to bind software threads to hardware threads using OMP_PROC_BIND=TRUE and set the OMP_PLACES environment variable. Another problem is that the threads are executed using preemptive multitasking with shared resources (eg. caches).
PS: please note that BLAS libraries (eg. OpenBLAS, BLIS, Intel MKL, etc.) are much more optimized than this code as most they already include clever optimization including manual vectorization for the target hardware, loop unrolling, multithreading, tiling and fast matrix transpositions when needed. For a 4096x4096 matrix, they are about 10 times faster.

Division in C# results not exact value after decimal point

Following division showing incorrect result. It should be 15 digits after point according to the windows calculator. But in C# showing 14 digits after point.

I fixed. It will be (Decimal)200/(Decimal)30 to be more Scale digits

If numbers must add up correctly or balance, use decimal. This includes any financial storage or calculations, scores, or other numbers that people might do by hand.
If the exact value of numbers is not important, use double for speed. This includes graphics, physics or other physical sciences computations where there is already a "number of significant digits".
Please refer to below code for any further help.
C# Type .Net Framework (System) type Signed? Bytes Occupied Possible Values
sbyte System.Sbyte Yes 1 -128 to 127
short System.Int16 Yes 2 -32768 to 32767
int System.Int32 Yes 4 -2147483648 to 2147483647
long System.Int64 Yes 8 -9223372036854775808 to 9223372036854775807
byte System.Byte No 1 0 to 255
ushort System.Uint16 No 2 0 to 65535
uint System.UInt32 No 4 0 to 4294967295
ulong System.Uint64 No 8 0 to 18446744073709551615
float System.Single Yes 4 Approximately ±1.5 x 10-45 to ±3.4 x 1038 with 7 significant figures
double System.Double Yes 8 Approximately ±5.0 x 10-324 to ±1.7 x 10308 with 15 or 16 significant figures
decimal System.Decimal Yes 12 Approximately ±1.0 x 10-28 to ±7.9 x 1028 with 28 or 29 significant figures
char System.Char N/A 2 Any Unicode character (16 bit)
bool System.Boolean N/A 1 / 2 true or false

Linux: Cannot allocate more than 32 GB/64 GB of memory in a single process due to virtual memory limit

I have a computer with 128 GB of RAM, running Linux (3.19.5-200.fc21.x86_64). However, I cannot allocate more than ~30 GB of RAM in a single process. Beyond this, malloc fails:
#include <stdlib.h>
#include <iostream>
int main()
{
size_t gb_in_bytes = size_t(1)<<size_t(30); // 1 GB in bytes (2^30).
// try to allocate 1 block of 'i' GB.
for (size_t i = 25; i < 35; ++ i) {
size_t n = i * gb_in_bytes;
void *p = ::malloc(n);
std::cout << "allocation of 1 x " << (n/double(gb_in_bytes)) << " GB of data. Ok? " << ((p==0)? "nope" : "yes") << std::endl;
::free(p);
}
}
This produces the following output:
/tmp> c++ mem_alloc.cpp && a.out
allocation of 1 x 25 GB of data. Ok? yes
allocation of 1 x 26 GB of data. Ok? yes
allocation of 1 x 27 GB of data. Ok? yes
allocation of 1 x 28 GB of data. Ok? yes
allocation of 1 x 29 GB of data. Ok? yes
allocation of 1 x 30 GB of data. Ok? yes
allocation of 1 x 31 GB of data. Ok? nope
allocation of 1 x 32 GB of data. Ok? nope
allocation of 1 x 33 GB of data. Ok? nope
allocation of 1 x 34 GB of data. Ok? nope
I searched for quite some time, and found that this is related to the maximum virtual memory size:
~> ulimit -all
[...]
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
virtual memory (kbytes, -v) 32505856
[...]
I can increase this limit to ~64 GB via ulimit -v 64000000, but not further. Beyond this, I get operation not permitted errors:
~> ulimit -v 64000000
~> ulimit -v 65000000
bash: ulimit: virtual memory: cannot modify limit: Operation not permitted
~> ulimit -v unlimited
bash: ulimit: virtual memory: cannot modify limit: Operation not permitted
Some more searching revealed that in principle it should be possible to set these limits via the "as" (address space) entry in /etc/security/limits.conf. However, by doing this, I could only reduce the maximum amount of virtual memory, not increase it.
Is there any way to either lift this limit of virtual memory per process completely, or to increase it beyond 64 GB? I would like to use all of the physical memory in a single application.
EDIT:
Following Ingo Leonhardt, I tried ulimits -v unlimited after logging in as root, not as standard user. Doing this solves the problem for root (the program can then allocate all the physical memory while logged in as root). But this works only for root, not for other users. However, at the least this means that in principle the kernel can handle this just fine, and that there is only a configuration problem.
Regarding limits.conf: I tried explicitly adding
hard as unlimited
soft as unlimited
to /etc/security/limits.conf, and rebooting. This had no effect. After login as standard user, ulimit -v still returns about 32 GB, and ulimit -v 65000000 still says permission denied (while ulimit -v 64000000 works). The rest of limits.conf is commented out, and in /etc/security/limits.d there is only one other, unrelated entry (limiting nproc to 4096 for non-root users). That is, the virtual memory limit must be coming from elsewhere than limits.conf. Any ideas what else could lead to ulimits -v not being "unlimited"?
EDIT/RESOLUTION:
It was caused by my own stupidity. I had a (long forgotten) program in my user setup which used setrlimit to restrict the amount of memory per process to prevent Linux from swapping to death. It was unintentionally copied from a 32 GB machine to the 128 GB machine. Thanks to Paul and Andrew Janke and everyone else for helping to track it down. Sorry everyone :/.
If anyone else encounters this: Search for ulimit/setrlimit in the bash and profile settings, and programs potentially calling those (both your own and the system-wide /etc settings) and make sure that /security/limits.conf does not include this limit... (or at least try creating a new user, to see if this happens in your user or the system setup)

This is a ulimit and system setup problem, not a c++ problem.
I can run your appropriately modified code on an Amazon EC2 instance type r3.4xlarge with no problem. These cost less than $0.20/hour on the spot market, and so I suggest you rent one, and perhaps take a look around in /etc and compare to your own setup... or maybe you need to recompile a Linux kernel to use that much memory... but it is not a C++ or gcc problem.
Ubuntu on the EC2 machine was already set up for unlimited process memory.
$ sudo su
# ulimit -u
--> unlimited
This one has 125GB of ram
# free
total used free shared buffers cached
Mem: 125903992 1371828 124532164 344 22156 502248
-/+ buffers/cache: 847424 125056568
Swap: 0 0 0
I modified the limits on your program to go up to 149GB.
Here's the output. Looks good up to 118GB.
root#ip-10-203-193-204:/home/ubuntu# ./memtest
allocation of 1 x 25 GB of data. Ok? yes
allocation of 1 x 26 GB of data. Ok? yes
allocation of 1 x 27 GB of data. Ok? yes
allocation of 1 x 28 GB of data. Ok? yes
allocation of 1 x 29 GB of data. Ok? yes
allocation of 1 x 30 GB of data. Ok? yes
allocation of 1 x 31 GB of data. Ok? yes
allocation of 1 x 32 GB of data. Ok? yes
allocation of 1 x 33 GB of data. Ok? yes
allocation of 1 x 34 GB of data. Ok? yes
allocation of 1 x 35 GB of data. Ok? yes
allocation of 1 x 36 GB of data. Ok? yes
allocation of 1 x 37 GB of data. Ok? yes
allocation of 1 x 38 GB of data. Ok? yes
allocation of 1 x 39 GB of data. Ok? yes
allocation of 1 x 40 GB of data. Ok? yes
allocation of 1 x 41 GB of data. Ok? yes
allocation of 1 x 42 GB of data. Ok? yes
allocation of 1 x 43 GB of data. Ok? yes
allocation of 1 x 44 GB of data. Ok? yes
allocation of 1 x 45 GB of data. Ok? yes
allocation of 1 x 46 GB of data. Ok? yes
allocation of 1 x 47 GB of data. Ok? yes
allocation of 1 x 48 GB of data. Ok? yes
allocation of 1 x 49 GB of data. Ok? yes
allocation of 1 x 50 GB of data. Ok? yes
allocation of 1 x 51 GB of data. Ok? yes
allocation of 1 x 52 GB of data. Ok? yes
allocation of 1 x 53 GB of data. Ok? yes
allocation of 1 x 54 GB of data. Ok? yes
allocation of 1 x 55 GB of data. Ok? yes
allocation of 1 x 56 GB of data. Ok? yes
allocation of 1 x 57 GB of data. Ok? yes
allocation of 1 x 58 GB of data. Ok? yes
allocation of 1 x 59 GB of data. Ok? yes
allocation of 1 x 60 GB of data. Ok? yes
allocation of 1 x 61 GB of data. Ok? yes
allocation of 1 x 62 GB of data. Ok? yes
allocation of 1 x 63 GB of data. Ok? yes
allocation of 1 x 64 GB of data. Ok? yes
allocation of 1 x 65 GB of data. Ok? yes
allocation of 1 x 66 GB of data. Ok? yes
allocation of 1 x 67 GB of data. Ok? yes
allocation of 1 x 68 GB of data. Ok? yes
allocation of 1 x 69 GB of data. Ok? yes
allocation of 1 x 70 GB of data. Ok? yes
allocation of 1 x 71 GB of data. Ok? yes
allocation of 1 x 72 GB of data. Ok? yes
allocation of 1 x 73 GB of data. Ok? yes
allocation of 1 x 74 GB of data. Ok? yes
allocation of 1 x 75 GB of data. Ok? yes
allocation of 1 x 76 GB of data. Ok? yes
allocation of 1 x 77 GB of data. Ok? yes
allocation of 1 x 78 GB of data. Ok? yes
allocation of 1 x 79 GB of data. Ok? yes
allocation of 1 x 80 GB of data. Ok? yes
allocation of 1 x 81 GB of data. Ok? yes
allocation of 1 x 82 GB of data. Ok? yes
allocation of 1 x 83 GB of data. Ok? yes
allocation of 1 x 84 GB of data. Ok? yes
allocation of 1 x 85 GB of data. Ok? yes
allocation of 1 x 86 GB of data. Ok? yes
allocation of 1 x 87 GB of data. Ok? yes
allocation of 1 x 88 GB of data. Ok? yes
allocation of 1 x 89 GB of data. Ok? yes
allocation of 1 x 90 GB of data. Ok? yes
allocation of 1 x 91 GB of data. Ok? yes
allocation of 1 x 92 GB of data. Ok? yes
allocation of 1 x 93 GB of data. Ok? yes
allocation of 1 x 94 GB of data. Ok? yes
allocation of 1 x 95 GB of data. Ok? yes
allocation of 1 x 96 GB of data. Ok? yes
allocation of 1 x 97 GB of data. Ok? yes
allocation of 1 x 98 GB of data. Ok? yes
allocation of 1 x 99 GB of data. Ok? yes
allocation of 1 x 100 GB of data. Ok? yes
allocation of 1 x 101 GB of data. Ok? yes
allocation of 1 x 102 GB of data. Ok? yes
allocation of 1 x 103 GB of data. Ok? yes
allocation of 1 x 104 GB of data. Ok? yes
allocation of 1 x 105 GB of data. Ok? yes
allocation of 1 x 106 GB of data. Ok? yes
allocation of 1 x 107 GB of data. Ok? yes
allocation of 1 x 108 GB of data. Ok? yes
allocation of 1 x 109 GB of data. Ok? yes
allocation of 1 x 110 GB of data. Ok? yes
allocation of 1 x 111 GB of data. Ok? yes
allocation of 1 x 112 GB of data. Ok? yes
allocation of 1 x 113 GB of data. Ok? yes
allocation of 1 x 114 GB of data. Ok? yes
allocation of 1 x 115 GB of data. Ok? yes
allocation of 1 x 116 GB of data. Ok? yes
allocation of 1 x 117 GB of data. Ok? yes
allocation of 1 x 118 GB of data. Ok? yes
allocation of 1 x 119 GB of data. Ok? nope
allocation of 1 x 120 GB of data. Ok? nope
allocation of 1 x 121 GB of data. Ok? nope
allocation of 1 x 122 GB of data. Ok? nope
allocation of 1 x 123 GB of data. Ok? nope
allocation of 1 x 124 GB of data. Ok? nope
allocation of 1 x 125 GB of data. Ok? nope
allocation of 1 x 126 GB of data. Ok? nope
allocation of 1 x 127 GB of data. Ok? nope
allocation of 1 x 128 GB of data. Ok? nope
allocation of 1 x 129 GB of data. Ok? nope
allocation of 1 x 130 GB of data. Ok? nope
allocation of 1 x 131 GB of data. Ok? nope
allocation of 1 x 132 GB of data. Ok? nope
allocation of 1 x 133 GB of data. Ok? nope
allocation of 1 x 134 GB of data. Ok? nope
allocation of 1 x 135 GB of data. Ok? nope
allocation of 1 x 136 GB of data. Ok? nope
allocation of 1 x 137 GB of data. Ok? nope
allocation of 1 x 138 GB of data. Ok? nope
allocation of 1 x 139 GB of data. Ok? nope
allocation of 1 x 140 GB of data. Ok? nope
allocation of 1 x 141 GB of data. Ok? nope
allocation of 1 x 142 GB of data. Ok? nope
allocation of 1 x 143 GB of data. Ok? nope
allocation of 1 x 144 GB of data. Ok? nope
allocation of 1 x 145 GB of data. Ok? nope
allocation of 1 x 146 GB of data. Ok? nope
allocation of 1 x 147 GB of data. Ok? nope
allocation of 1 x 148 GB of data. Ok? nope
allocation of 1 x 149 GB of data. Ok? nope
Now, about that US$0.17 I spent on this...

Mifare card memory space?

What is the net memory space remaining in a MIFARE Classic 1K card considering that keys and access bits take 16 bytes per sector, and the unique id (UID) and manufacturer data takes 16 bytes for each card?

MIFARE Classic 1K consists of 16 sectors. One sector consists of 4 blocks (sector trailer + 3 data blocks). Each block consists of 16 bytes.
This gives 16 Sectors * 4 Blocks * 16 Bytes = 1024 Bytes.
The actually usable data area depends on how you want to use the card:
You use only one key per sector (key A); you use the unused parts of the sector trailers for data storage; you don't use a MIFARE application directory (MAD):
The first block of the first sector is always reserved (UID/manufacturer data) and cannot be used to store user data.
6 bytes of each sector trailer are reserved for key A. 3 bytes of each sector trailer are reserved for the access conditions. The remaining 7 bytes of the sector trailer can be used to store user data.
Thus, you can store 1 Sector * (2 Blocks * 16 Bytes + 1 Block * 7 Bytes) + 15 Blocks * (3 Blocks * 16 Bytes + 1 Block * 7 Bytes) = 864 Bytes.
You use two keys per sector (key A and key B); you use the unused parts of the sector trailers for data storage; you don't use a MIFARE application directory (MAD):
12 bytes of each sector trailer are reserved for key A and B. 3 bytes of each sector trailer are reserved for the access conditions. The remaining byte of the sector trailer can be used to store user data.
Thus, you can store 1 Sector * (2 Blocks * 16 Bytes + 1 Block * 1 Byte) + 15 Blocks * (3 Blocks * 16 Bytes + 1 Block * 1 Byte) = 768 Bytes.
You use two keys per sector (key A and key B); you don't use the unused parts of the sector trailers for data storage; you don't use a MIFARE application directory (MAD):
Thus, you can store 1 Sector * 2 Blocks * 16 Bytes + 15 Blocks * 3 Blocks * 16 Bytes = 752 Bytes.
You use two keys per sector (key A and key B); you use the unused parts of the sector trailers for data storage; you use a MIFARE application directory (MAD):
The data blocks and the general purpose byte (remaining byte in the sector trailer) of the first sector are reserved for the MAD.
The general purpose byte in the other sectors can be used.
Thus, you can store 15 Blocks * (3 Blocks * 16 Bytes + 1 Block * 1 Byte) = 735 Bytes.
You use two keys per sector (key A and key B); you use NXP's NDEF data mapping to transport an NDEF message:
The MAD is used to assign sectors to the NDEF application.
NDEF data can only be stored in the 3 data blocks of each NDEF sector.
The NDEF message is wrapped in an NDEF TLV structure (1 byte for the tag 0x03, three bytes to indicate a length of more than 254 bytes).
Thus, you can store an NDEF message of up to 15 Blocks * 3 Blocks * 16 Bytes - 4 bytes = 716 Bytes. Such an NDEF message could have a maximum payload of 716 Bytes - 1 Byte - 1 Byte - 4 Bytes = 710 Bytes (when using a NDEF record with TNF unknown, 1 header byte, 1 type length byte, 4 payload length bytes).

Finding File WIth Fixed File Size (>0) in Unix/Linux

I have a list of file that looks like this
4 -rw-r--r-- 1 neversaint hgc0746 53 May 1 10:37 SRX016372-SRR037477.est_count
4 -rw-r--r-- 1 neversaint hgc0746 53 May 1 10:34 SRX016372-SRR037478.est_count
4 -rw-r--r-- 1 neversaint hgc0746 53 May 1 10:41 SRX016372-SRR037479.est_count
0 -rw-r--r-- 1 neversaint hgc0746 0 Apr 27 11:16 SRX003838-SRR015096.est_count
0 -rw-r--r-- 1 neversaint hgc0746 0 Apr 27 11:32 SRX004765-SRR016565.est_count
What I want to do is to find files that has exactly size 53. But why this command failed?
$ find . -name "*.est_count" -size 53 -print
It works well though if I just want to find file of size 0 with this command:
$ find . -name "*.est_count" -size 0 -print

You need to suffix the size 53 by 'c'. As per find's manpage -
-size n[cwbkMG]
File uses n units of space. The following suffixes can be used:
`b' for 512-byte blocks (this is the default if no suffix is
used)
`c' for bytes
`w' for two-byte words
`k' for Kilobytes (units of 1024 bytes)
`M' for Megabytes (units of 1048576 bytes)
`G' for Gigabytes (units of 1073741824 bytes)
The size does not count indirect blocks, but it does count
blocks in sparse files that are not actually allocated. Bear in
mind that the `%k' and `%b' format specifiers of -printf handle
sparse files differently. The `b' suffix always denotes
512-byte blocks and never 1 Kilobyte blocks, which is different
to the behaviour of -ls.

-size n[ckMGTP]
True if the file's size, rounded up, in 512-byte blocks is n. If
n is followed by a c, then the primary is true if the file's size
is n bytes (characters). Similarly if n is followed by a scale
indicator then the file's size is compared to n scaled as:
k kilobytes (1024 bytes)
M megabytes (1024 kilobytes)
G gigabytes (1024 megabytes)
T terabytes (1024 gigabytes)
P petabytes (1024 terabytes)
You need to use -size 53c.

This is what I get on A Mac OS 10.5
> man find
...
-size n[c]
True if the file's size, rounded up, in 512-byte blocks is n. If n
is followed by a c, then the primary is true if the file's size is n
bytes (characters).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string