Sensor to read data from defective disks - sensors

I am looking for a sensor that can be used as a reading head for hard disks.Is there a device like that?

The size of the magnetic bit is extremely small. While such tools exist, they are likely incredibly expensive (hundreds of thousands at the very minimum) due to the scale involved. It's cheaper to send the disk to a data recovery company which has these tools for worst-case recovery.

Related

Choices for shared memory system, MPI library, original RDMA or ULP over RDMA?

I am new on High Performance Computing (HPC), but I am going to have a HPC project, so I need some help to solve some fundamental problems.
The application scenario is simple: Several servers connected by the InfiniBand (IB) network, one server for Master, and others for slaves. only the master read/write in-memory data (size of the data ranges from 1KB to several hundred MBs) into slaves, while slaves just passively store the data in their memories ( and dump the in-memory data into disks at the right time). All computation are are performed in the Master, before the writing or after the reading the data to/from the slaves. The requirement of the system is low latency (small regions of data, such as 1KB-16KB) and high throughput (large regions of data, several hundred MBs).
So, My questions are
1. Which concrete way is more suitable for us? MPI, primitive IB/RDMA library or ULPs over RDMA.
As far as I know, existing Message Passing Interface (MPI) library, primitive IB/RDMA library such as libverbs and librdmacm and User Level Protocal (ULPs) over RDMA might be feasible choices, but I am not very sure of their applicable scopes.
2. Should I make some tunings for the OS or the IB network for better performance?
There is a paper [1] from Microsoft announces that
We improved performance
by up to a factor of eight with careful tuning and
changes to the operating system and the NIC drive
For my part, I will try to avoid such performance tuning as I can. However, if the tuning is unavoidable, I will try my best. The IB network of our environment is Mellanox InfiniBand QDR 40Gb/s and I can choose the Linux OS for servers freely.
If you have any ideas, comments and answers are welcome!
Thanks in advance!
[1] FaRM: Fast Remote Memory
If you use MPI, you will have the benefit of an interconnect-independent solution. It doesn't sound like this is going to be something you are going to keep around for 20 years, but software lasts longer than you ever think it will.
Using MPI also gives you the benefit of being able to debug on your (oversusbscribed, possibly) laptop or workstation before rolling it out onto the infiniband machines.
As to your second question about tuning the network, I am sure there is no end of tuning you can do but until you have some real workloads and hard numbers, you're wasting your time. Get things working first, then worry about optimizing the network. Maybe you need to tune for many tiny packages. Perhaps you need to worry about a few large transfers. The tuning will be pretty different depending on the case.

How to make the OS schedule disk accesses optimally?

Suppose that a process needs to access the file system in many (1000+) places, and the order is not important to the program logic. However, the order obviously matters for performance if the file system is stored on a (spinning) hard disk.
How can the application programmer communicate to the OS that it should schedule the accesses optimally? Launching 1000+ threads does not seem practical. Does database management software accomplish this, and if so, then how?
Additional details: I had a large (1TB+) mmapped file where I needed to read 1000+ chunks of about 1KB, each time in new, unpredictable places.
In the early days when parameters like Wikipedia: Hard disk drive performance characteristics → Seek time were very expensive and thus very important, database vendors payed attention to the on-disk data representation and layout as can be seen e.g. in Oracle8i: Designing and Tuning for Performance → Tuning I/O.
The important optimization parameters changed with appearance of Solid-state drives (SSD) where the seek time is 0 (or at least constant) as there is nothing to rotate. Some of the new parameters are addressed by Wikipedia: Solid-state drive (SSD) → optimized file systems.
But even those optimization parameters go away with the use of Wikipedia: In-memory databases. The list of vendors is pretty long, all big players on it.
So how to schedule your access optimally depends a lot on the use case (1000 concurrent hits is not sufficient problem description) and buying some RAM is one of the options and "how can the programmer communicate with the OS" will be one of the last (not first) questions
Files and their transactions are cached in various devices in your computer; RAM and the HD cache are the most usual places. The file system driver may also implement IO transaction queues, defragmentation, and error-correction logic that makes things complicated for the developer who wants to control every aspect of file access. This level of complexity is ultimately designed to provide integrity, security, performance, and coordination of file access across all processes of your system.
Optimization efforts should not interfere with the system's own caching and prediction algorithms, not just for IO but for all caches. Trying to second-guess your system is a waste of your time and your processors' time.
Most probably your IO operations and data will stay on caches and later be committed to your storage devices when your OS sees fit.
That said, there's always options like database suites, mmap, readahead mechanisms, and direct IO to your drive. You will need to invest time benchmarking any of your efforts. I advise against multiple IO threads because cache contention will make things even slower than one thread.
The kernel will already reorder the read/write requests (e.g. to fit the spin of a mechanical disk), if they come from various processes or threads. BTW, most of the reads & writes would go to the kernel file system cache, not to the disk.
You might consider using posix_fadvise(2) & perhaps (in a separate thread) readahead(2). If -instead of read(2)-ing- you use mmap(2) to project some file portion to virtual memory, you might use also madvise(2)
Of course, the file system does not usually guarantee that a sequential portion of a file is physically sequentially located on the disk (and even the disk firmware might reorder sectors). See picture in Ext2 wikipage, also relevant for Ext4. Some file systems might be better in that respect, and you could tune their block size (at mkfs time).
I would not recommend having thousands of threads (only at most a few dozens).
At last, it might worth buying some SSD or some more RAM (for file cache). See http://linuxatemyram.com/
Actual performance would depend a lot on the particular system and hardware.
Perhaps using an indexed file library like GDBM or a database library Sqlite (or a real database like PostGreSQL) might be worthwhile! Perhaps have fewer files but bigger ones could help.
BTW, you are mmap-ing, and reading small chunk of 1K (smaller than page size of 4K). You could use madvise (if possible in advance), but you should try to read larger chunks, since every file access will bring at least a whole page.
You really should benchmark!

Using a hard disk without filesystem for big data

I'm working on a web crawler and have to handle big data (about 160 TB raw data in trillions of data files).
The data should be stored sequencial as one big bz2 file on the magnetic hard disk. A SSD is used to hold the meta data. THe most important operation on the hard disk is a squential read over all of the 4 TB off the disk, which should happen with full maximum speed of 150 MB/s.
I want to not waste the overhead of a file system an instead use the "/dev/file" devices directly. Does this access use the os block buffer? Are the access operations queued or synchronous in a FIFO style?
Is it better to use /dev/file or write your own user level file system?
Has anyone experience with it.
If you don't use any file system but read your disk device (e.g. /dev/sdb) directly, you are losing all the benefit of file system cache. I am not at all sure it is worthwhile.
Remember that you could use syscalls like readahead(2) or posix_fadvise(2) or madvise(2) to give hints to the kernel to improve performance.
Also, you might when making your file system use a larger than usual block size. And don't forget to use big blocks (e.g. of 64 to 256 Kbytes) when read(2)-ing data. You could also use mmap(2) to get the data from disk.
I would recommend against "coding your own file system". Existing file systems are quite tuned (and some are used on petabytes of storage). You may want to chose big blocks when making them (e.g. -b with mke2fs(8)...)
BTW, choosing between filesystem and raw disk data is mostly a configuration issue (you specify a /dev/sdb path if you want raw disk, and /home/somebigfile if you want a file). You could code a webcrawler to be able to do both, then benchmark both approaches. Very likely, performance could depend upon actual system and hardware.
As a case in point, relational database engines used often raw disk partitions in the previous century (e.g. 1990s) but seems to often use big files today.
Remember that the real bottleneck is the hardware (i.e. disk): CPU time used by filesystems is often insignificant and cannot even be measured.
PS. I have not much real recent experience with these issues.

Need Hardware Sizing guidelines for SSAS Tabular server

Are there any good sources of hardware sizing guidelines for servers running SSAS Tabular? Something along the lines of "if your models are estimated to be this big (in terms of memory) and you're expecting and average x-number of simultaneous connections...here's what you should start out with".
obviously they need more memory than most other systems to hold the data...but I'd also expect them to be able to take advantage of more cores and higher clock speeds...once it's in memory, its really just a matter of ripping through it and crunching the numbers. More cores to handle more concurrency, higher clock-speed for faster "crunching".
you're supposed to aim for 40% of the memory size of the original database. 1tb data = 400gb ram for SSAS Tabular.
FWIW, Microsoft just released a whitepaper (Jan 2013) that addresses this material
Hardware Sizing a Tabular Solution (SQL Server Analysis Services)

Multicore processor core communication speeds

I would look to find the speed of communication between the two cores of a computer.
I'm in the very early stages of planning to massively parallelise a sequential program and I need to think about network communication speeds vs. communication between cores on a single processor.
Ubuntu linux probably provides some way of seeing this sort of information? I would have thought speed fluctuates.. I just need some average value. I'm basically needing to write something up at the moment and it would be good to talk about these ratios.
Any ideas?
Thanks.
According to this benchmark: http://www.dragonsteelmods.com/index.php?option=com_content&task=view&id=6120&Itemid=38&limit=1&limitstart=4 (Last image on the page)
On an Intel Q6600, inter-core latency is 32 nanoseconds. Network latency is measured in milliseconds which 1,000,000 milliseconds / nanosecond. "Good" network latency is considered around or under 100ms, so given that, the difference is about the order of 1 million times faster for inter-core latency.
Besides latency though there's also Bandwidth to consider. Again based on the linked bookmark, benchmark for that particular configuration, inter-core bandwidth is about 14GB/sec whereas according to this: http://www.tomshardware.com/reviews/gigabit-ethernet-bandwidth,2321-3.html, real-world test of a Gigabit ethernet connection shows about 35.8MB/sec so the difference there is smaller, only on the order of around 500 times faster in terms of bandwidth as opposed to a 1,000,000 times in latency. Depending on which is more important to your application that might change your numbers.
The network speeds are measured in milliseconds for Ethernet ($5-$100/port), or microseconds for specialized MPI hardware like Dolphin on Myrintet (~ $1k/port). Inter-core speeds are measured in nanoseconds, as the data is copied from one memory area to another, and then some signal is sent from one CPU to another (the data will be protected from simultaneous access by a mutex or a full-bodied queue).
So, using a back'o'the'napkin calculation the ratio is 1:10^6.
Inter-core communication is going to be massively faster. Why ?
the network layer imposes a massive overhead in terms of packets, addressing, handling contention etc.
the physical distances will impose a sizeable impact
How you measure inter-core communication speed would be very difficult. But given the above I think it's a redundant calculation to make.
This is a non-trivial thing to find. The speed of data transfer between two cores depends entirely on the application. It could depend on any (or all) of - the speed of register access, the clock speed of the cores, the system bus speed, the latency of your cache, the latency of your memory, etc., etc., etc. In short, run a benchmark or you'll be guessing in the dark.

Resources