IntelliJ IDEA compilation speedup in Linux - linux

I'm working with IntelliJ IDEA on Linux and recently I've got 16 GB of RAM, so is there any ways to speed up my projects compilation using this memory?

First of all, in order to speedup IntelliJ IDEA itself, you may find this discussion very useful.
The easiest way to speedup compilation is to move compilation output to RAM disk.
RAM disk setup
Open fstab
$ sudo gedit /etc/fstab
(instead of gedit you can use vi or whatever you like)
Set up RAM disk mount point
I'm using RAM disks in several places in my system, and one of them is /tmp, so I'll just put my compile output there:
tmpfs /var/tmp tmpfs defaults 0 0
In this case your filesystem size will not be bounded, but it's ok, my /tmp size right now is 73MB. But if you afraid that RAM disk size will become too big - you can limit it's size, e.g.:
tmpfs /var/tmp tmpfs defaults,size=512M 0 0
Project setup
In IntelliJ IDEA, open Project Structure (Ctrl+Alt+Shift+S by default), then go to Project - 'Project compiler output' and move it to RAM disk mount point:
/tmp/projectName/out
(I've added projectName folder in order to find it easily if I need to get there or will work with several projects at same time)
Then, go to Modules, and in all your modules go to Paths and select 'Inherit project compile output path' or, if you want to use custom compile output path, modify 'Output path' and 'Test output path' the way you did it to project compiler output before.
That's all, folks!
P.S. A few numbers: time of my current project compilation in different cases (approx):
HDD: 80s
SSD: 30s
SSD+RAM: 20s
P.P.S. If you use SSD disk, besides compilation speedup you will reduce write operations on your disk, so it will also help your SSD to live happily ever after ;)

Yes you can. There is several ways to do this. First you can fine tune the JVM for the amount of memory you have. Take this https://gist.github.com/zafarella/43bc260c3c0cdc34f109 one as example.
In addition depending on what linux distribution you use there is a way creating RAM disk and rsyncing content into HDD. Basically you will place all logs and tmp files (including indexes) into RAM - your Idea will fly.
Use something like this profile-sync-daemon to keep files synced. It is possible easily add Idea as an app. Alternatively you can use anything-sync-daemon
You need to change "idea.system.path" and "idea.log.path"
More details on Idea settings could be found at their docs. The idea is to move whatever changes often into RAM.
More RAM Disk alternatives https://wiki.debian.org/SSDOptimization#Persistent_RAMDISK
The bad about this solution is that when you run out of space in RAM OS will page things and it will slow down everything.
Hope that helps.

In addition to ramdisk approach, you might speedup compilation by giving its process more memory (but not too much) and compiling independent modules in parallel. Both options can be found on Settings | Compiler.

Related

redirect output to other partition linux

So, I have a scientific server with a HDD and a SSD hard drive.
Where for computations involving lot's of data reading/writing a user can use the SSD but all the home directories are on the HDD.
Is there an automatic way to redirect the output of any program writing on the SSD to the home directory of the user running the program if the SSD is full?
If the best solution is to write my own script, then what is the best way to determine if the SSD runs out of space?
My OS is Ubuntu 18.04 LTS
In short, I do not think there is such a thing and I do believe that you should implement a bash script that checks (my tool of choice would simply be df) that there is enough space for you to run the next computation run before actually doing it. Maybe you should pre-allocate the space you intend to use, if possible, to avoid other concurrent runs to crash/run out of space? Maybe you should have an automated procedure to clean up some space?
Obviously, you could have the ssd available on some mountpoint in /home/, and then periodically check with a cron job whether it is full. And the maybe unmount it and send a warning mail. This will sort of do what you want. Sort of. But what happens then when also the HDD gets full? Watch out- these kind of problems can easily cause a server to crash, or otherwise experience issues.
This looks like a problem you might partially solve/mitigate by e.g., using a quota scheme (that is, limiting the amount of space that each user can allocate) or better yet by using a dedicated system for queueing jobs and allocating resources.

Linux: How to enable Execute in place (XIP) for RAMFS/TMPFS

I'm working on an embedded system where the rootfs is constructed in a tmpfs partition by the init process. After the rootfs is complete, it will do a pivot-root and start spawning processes located in the rootfs.
But it seems like XIP is not working for our tmpfs, and all the applications is therefore loaded into ram twice (in the tmpfs and again into ram when loaded).
Can this really be true?
I found an old discussion thread at https://ez.analog.com/thread/45262 which describe the same issue as I'm seeing.
How can I achieve XIP for a file-system located in memory?
What you are attempting to do should be indeed possible (though I haven't tried it myself). The problem is simply you are not going about it the correct way. If you use the block RAM device ("brd") you can create a block device that is actually RAM presented as a block device. To enable this on your kernel (sorry you do not say which kernel you have so I will just go with the kernel 4.14), you need to enable CONFIG_BLK_DEV_RAM as well as CONFIG_BLK_DEV_RAM_DAX in your kernel configuration. They are both under "Device Drivers" -> "Block Devices". Then you create such a RAM backed block device and then create for example an ext2 or ext4 or XFS file system on it and then prepare your rootfs into that file system and then pivot-root into it. Now you are executing in a RAM backed file system which has XIP (now replaced by DAX) functionality thus executing applications should now at least in theory work correctly without creating a copy of the data and simply running it out of the RAM pages of the block RAM device.
Please do beware that such approach has limitation such as for example that kernel modules themselves will still be copied into RAM, get_user_pages() may not work, O_DIRECT may not work, and neither might RDMA, sendfile() and splice().
Some relevant things to look at include:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/block/Kconfig?h=v3.19#n359
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/block/Kconfig?h=v3.19#n396
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/blockdev/ramdisk.txt?h=v3.19
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xip.txt?h=v3.19
Note XIP was replaced by DAX since 4.0 kernel so there see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt?h=4.14
Also note that support for DAX was removed from block RAM driver with kernel 4.15 so you will no longer be able to do this once you move to kernel 4.15 and later... See commit 7a862fbbdec665190c5ef298c0c6ec9f3915cf45 for the reasoning behind removing the functionality.
I hope this is enough to set you on the right track and sorry about the bad news that the functionality has been removed since 4.15 kernel...

Searching through really big files

I need to search through a TB of raw hard disk data. I need to find a couple of things inside. I tried using sudo cat /dev/sdc | less but this fails because it puts everything into RAM that is read. I only have 8 GB of RAM and 8 in swap space so putting a whole TB of data into RAM will not work.
I was wondering if I could somehow make less forgot what it has read after the 1GB mark or maybe use another editor.
I accidentally repartitioned my drive and lost some important files. I tried some utilities but none of them worked so I tried this. I got a few of the files but I can't get the rest because the computer freezes and runs out of RAM.
I learned my lesson, I need to make more frequent backups. Any help is greatly appreciated.
The -B option to less is exactly what you ask for. It allows less to be forgetful. Combine with -b1048576 to allocate 1G (the -b unit is K)
Or do it the interactive way: run less normally, scroll down until the point where it starts to get a little laggy, then just type -B at the less prompt to activate the option (did you know you can set less options interactively?)
Just don't try to scroll backward very far or you'll be forgotten-content land, where weird things happen.
(Side note: I've done this kind of recovery before, and it's easier if you can find the filesystem structures (inode blocks etc.) that point to the data, rather than searching for the data in a big dump. Even if some of the inodes are gone, by first recovering everything you can from the surviving inodes you narrow down the range of unknown blocks where the other files might be.)

How to speed up reading of a fixed set of small files on linux?

I have 100'000 1kb files. And a program that reads them - it is really slow.
My best idea for improving performance is to put them on ramdisk.
But this is a fragile solution, every restart need to setup the ramdisk again.
(and file copying is slow as well)
My second best idea is to concatenate the files and work with that. But it is not trivial.
Is there a better solution?
Note: I need to avoid dependencies in the program, even Boost.
You can optimize by storing the files contiguous on disk.
On a disk with ample free room, the easiest way would be to read a tar archive instead.
Other than that, there is/used to be a debian package for 'readahead'.
You can use that tool to
profile a normal run of your software
edit the lsit of files accesssed (detected by readahead)
You can then call readahead with that file list (it will order the files in disk order so the throughput will be maximized and the seektimes minimized)
Unfortunately, it has been a while since I used these, so I hope you can google to the resepctive packages
This is what I seem to have found now:
sudo apt-get install readahead-fedora
Good luck
If your files are static, I agree just tar them up and then place that in a RAM disk. Probably be faster to read directly out of the TAR file, but you can test that.
edit:: instead of TAR, you could also try creating a squashfs volume.
If you don't want to do that, or still need more performance then:
put your data on an SSD.
start investigating some FS performance test, starting with EXT4, XFS, etc...

How to tell whether two NFS mounts are on the same remote filesystem?

My Linux-based system displays statistics for NFS-mounted filesystems, something like this:
Remote Path Mounted-on Stats
server1:/some/path/name /path1 100 GB free
server2:/other/path/name /path2 100 GB free
Total: 200 GB free
That works fine. The problem is when the same filesystem on the NFS server has been mounted twice on my client:
Remote Path Mounted-on Stats
server1:/some/path/name /path1 100 GB free
server1:/some/path/name2 /path2 100 GB free
Total: 200 GB free
server1's /some/path/name and /some/path/name2 are actually on the same filesystem, which has 100 GB free, but I erroneously add them up and report 200 GB free.
Is there any way to detect that they're on the same partition?
Approaches that won't work:
"Use statfs()": statfs() returns a struct statfs, which has a "file system ID" field, f_fsid. Unfortunately it's undefined and gets zeroed out over NFS.
"Don't mount the same partion multiple times." This is outside of my control.
"Use a heuristic based on available space." The method has to definitively work. Also, statfs() caches its output so it would be difficult to get this right in the face of large data movement.
If there's no solution I'll have to generate a config file in every potential mount point on the server side, but it would be a lot nicer if there was
some clean way to avoid that.
Thanks!
I guess if "stat -c %d /mountpoint" do what you want (I cannot test it right now)?
You probably want to read the remote system's shared file systems - using:
showmount -e server
That will give you the real paths that are being shared. When walking mounts from the remote system, prune them to the common root from the remote system and use that to determine if the mount points are from the same underlying file system.
This doesn't help you in the case that the file systems are separately shared from the same underlying file system.
You could add in a heuristic of checking for the overall file system size and space available, and assuming that if they're the same, and from the same remote server that it's on the same partition mapped to the shortest common path of the mount devices.
None of these help if you share from a loopback mounted file system that looks completely different in form from the others.
It doesn't help you in the case of a server that can be addressed in different names and addresses.

Resources