I see that inspecting /proc/self/maps on Linux machines lets me see the pages that have been mapped in. As a result I can write a program to read and parse the pages it has mapped in.
How could one go about doing something similar for Windows? Are there any APIs for the same? If not, do you have any suggestions on how this could be done?
Yes, the possibility exists. First of all You have to access any process memory, or better, make it "accessible". Then You can read memory.
Here are some usefull links ( by the way, You should always look in there, if You come from linux and try to do things on windows, it is the main source ).
https://msdn.microsoft.com/en-us/library/windows/desktop/ms680553%28v=vs.85%29.aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366916%28v=vs.85%29.aspx
There is all documented.
But ther are also undocumented approaches, really crazy stuff, which also deals whith this topic.Like this for example.
http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/Memory%20Management/Virtual%20Memory/NtReadVirtualMemory.html
Related
I'm envisioning a program I will need to write and need some advice on the language. I will need to be doing raw disk access so I can display hex data, scroll or jump around on the disk, and do calculations from the data. I have been using Java the most and it's portability between OSes for my other projects is certainly a benefit, but raw disk access either isn't possible, would require JNI, or may be possible on *nix when you can access disks as "files". I keep reading different things. By the way I can handle this type of work using Files in Java, but in this project I need to be able to access the disk so disk imaging to files beforehand isn't needed.
It would be nice to make it as portable as I could since there is a real benefit to using different OSes, but it may not be worth it and I should just stick with Windows and a native compiling language. Is there any existing JNI code that could help? I have experience in other languages but I haven't used C++ in a long time. Should I forget about Java and tryout C#? Someone told me that Python has libraries available for this type of thing despite it being an interpreted language so what about Python? What would be best for the project? What would be good for me to learn?
Searching around for raw disk access, Java, Python, does not seem to give any useful results. Thanks for any help!
EDIT
It seems like this will be quite involved, learning what I need to know, and then learning that. It's too bad I couldn't use disk images instead because then I'd be able to start working on it immediately in Java, which I'm comfortable with and I know I could make a good product. I've gotten great throughput in other raw data processing projects with Java so that doesn't worry me. Plus it would be truly portable. Hmm might have to consider it more. I'd probably need a big azz storage system to hold all the images though :)
UPDATE
Just a note for anyone that finds this question... I have figured out this works just by specifying the disk for the File using the PhysicalDrive notation (in Windows) like the answer below by hunsricker. However there are some issues. First if you do a "exists" check File.exists(), it says the file does not exist. Also, the file size is zero, and when I get a "java.io.IOException: The drive cannot find the sector requested" is the way I know I'm at the end of the file. And the worst part- I was getting some odd runtime errors doing this when I was reading some bytes and skipping some (64) bytes in a loop. I altered my program a bit to read different amounts and that changed where the error occurred. I was using BufferedInputStream instead of RandomAccessFile like hunsricker below by the way, not sure if it makes a difference. My only answer for this issue is that since I'm doing physical disk access, it doesn't like that I am not reading in even 512 byte sectors or 1K blocks or such. Indeed when I read even 1K, 2K, 512bytes, etc., and don't skip anything, it works fine and runs to the end. The errors I saw were java.io.ioexception "incorrect function" and java.io.ioexception "the parameter is incorrect". There was no rhyme or reason to them. Then I made image files of the same data and ran my program on those and it would do any combination of reading and skipping bytes with no problem. Physical disk access was more picky I guess.
I was looking by myself for a possibility to access raw data of a physical drive. And now as I got it to work, I just want to tell you how. You can access raw disk data directly from within java ... just run the following code with administrator priviliges:
File diskRoot = new File ("\\\\.\\PhysicalDrive0");
RandomAccessFile diskAccess = new RandomAccessFile (diskRoot, "r");
byte[] content = new byte[1024];
diskAccess.readFully (content);
So you will get the first kB of your first physical drive on the system. To access logical drives - as mentioned above - just replace 'PhysicalDrive0' with the drive letter e.g. 'D:'
oh yes ... I tried with Java 1.7 on a Win 7 system ...
RageDs link brougth me to the solution ... thank you :-)
Disk access will depend on the disk's particular drivers. And since this is such a low-level task, I doubt Java/Python would have such support (these languages are generally used for fast, high-level software package development). Since you will probably not be aware of the disks' particular hardware implementations, you will probably have to end up using an operating system API (which is OS-dependent of course). I would recommend looking into C and/or the particular assembly language for the architecture you plan to do this work on. Then, I would recommend continuing your search to find the appropriate API for your target OS.
EDIT
For Windows, a good place to start is here. More specifically, MSDN's CreateFile() is probably a function you would be interested in.
I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.
Yes, if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.
As people have mentioned, mmap is a good solution here.
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.
The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.
Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.
I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.
I am developing under Linux with pretty tight constraints on disk usage. I'd like to be able to point logging to a fixed-size file. For example, if my application outputs all logs to stdout:
~/bin/myApp > /dev/debug1
and then, to see the last amount of output:
cat /dev/debug1
would write out however many bytes debug1 was setup to save (if at least that many had been written there).
This post suggests using expect or its library, but I was wondering if anyone has seen a "pseudo-tty" device driver-type implementation as I would prefer to not bind any more libraries to my executable.
I realize there are other mechanisms like logrotate, but I'd prefer to have a non-cron solution.
Pointers, suggestions, questions welcome!
Perhaps you could achieve what you want using mkfifo and something that reads the pipe with a suitable buffer. I haven't tried, but less --buffers=XXXXXX might work for this.
My boss asked me to find some algorithms or existed libraries.
coz our application runs on linux, and it need lots of files, maybe over 5G-20G....but we dont need to load the files at one time, but at anytime when the file is needed. btw, we have maybe over 100-1000 files stored in our drive.
However, this application is kinda realtime, at least. Simple and ordinary reading or loading can not meet our needs.
I know in linux and windows, there is mechanism virture memory..in linux we use mmap to realize our swapping demands...
But boss is boss, who said we dont take that into consideration presently..
So, i am here prey for helps..
thanx
Your operating system can handle caching and virtual memory (’n stuff) a lot better than you (or any library) can. Apart from simply keeping all files in memory (I heard RAM was cheap :) there’s not much you can do.
Here's a link to Windows documentation.
Basically I would like to get similar data, but on Linux. If not all is possible, then at least some parts.
If you enable CONFIG\_TASK\_IO\_ACCOUNTING, you will have the information available in /proc/<pid>/io. This is available since kernel 2.6.20, but not normally enabled by default (However, in Ubuntu 8.04 it seems like it is enabled).
You can read about the various data items in Documentation/filesystems/proc.txt in the kernel source tree. Especially section 2.14 should be of interest.
Have a look at /proc/PID/io - it's the current I/O accounting information of the process with PID.
Look at the pseudo-files under /proc/<PID>/. Maybe you can find what you need there.
Look at man 5 proc, or failing that the kernel documentation. However, I don't see much that looks promising. Sorry.
Perhaps you want getrusage()? Not all fields are maintained under linux however. Perhaps enabling the CONFIG_TASK_IO_ACCOUNTING will cause them to be maintained?