I have been asked in my project to profile memory usage of a C++ application that runs on Linux for an embedded like device. We need to know this in order to decide how much RAM we need.
I have done some research and found many tools or commands to find the max memory usage of a process when it is running.
Here are those:
top
Command: top -p $Pid
ps
Command: ps -o rss=$pid
pmap
Command: pmap -x $pid
valgrind -massif
valgrind --tool=massif --pages-as-heap=yes program
smaps
Used the following link: Script
Linux system monitor app
But I get different memory usage in each of those. I have tried to understand in depth, but left me confused which is close enough to trust. So someone with experience could share which one they use and also why we have these many ways to measure memory which gives different results.
VM, RSS and Shared parts are having different values in all of them.
Thanks
You can get the maximum resident set size of the process during its lifetime, in Kilobytes by using the following command:
/usr/bin/time -f %M
Followed by the execution of your C++ binary.
I need to request some AWS resources and in order to do so, I need to identify what are my requirements for:
Number of CPU cores (maybe GPUs as well) (performs parallel processing)
Amount of Memory required
I/O & network read/write time (optional / good to know)
How can I profile my script so that I know if I am using the requested resources to the fullest?
How can I profile the whole system? Something along the following lines:
i. Request large number of compute resources (CPUs and RAM) on AWS
ii. Start the system profiler
iii. Run my program and wait for it to finish
iv. Stop the system profiler and identify the peak #CPUs and RAM used
Context: Unix / Linux
You could use time(1), but there are two variants of it.
the time builtin in most shells is usually not enough for your needs, so...
you need the /usr/bin/time program from the time package
Then you'll run /usr/bin/time -v yourscript
and inside your script you could use the times builtin (see this)
You also should consider using perf(1) and oprofile(1)
At last, you might eventually code something or use something querying the kernel thru /proc/ (see proc(5)). Utilities like xosview are using that.
I am new to casperjs and planning to use it to accurately simulate anywhere from a few dozen to low hundreds of concurrent sessions accessing a private server on a private network. Unlike typical HTTP load generators (Apache bench, httperf, ...), my purpose is to be able to control each session programmatically (increase delay between requests, have 'smarts' built into each script) and have each session have distinct source IP addresses.
My current thinking is to use OpenVZ containers (openvz.org) to create each 'virtual' client running casperjs (minimal functionality I need is following elements on the UI and taking screenshots). Would love to hear of anyone who has done something similar.
The crux of my question is: what would the 'slimmest' environment for running casperjs be? I'd like to strip down the OS as much as possible to be able to scale multiple clients. Specifically:
any recommended low-footprint UNIX/Linux distributions for CasperJS?
any specific recommendations on stripping down mainstream (CentOS, Debian, ...) distributions?
Thank you all in advance. I look forward to hearing your input on this specific question or similar experiences/tools for what I'm trying to achieve...
Fernando
CasperJS is headless, e.g. it doesn't need X running to function. Any bare bones Linux distribution will do you well.
any recommended low-footprint UNIX/Linux distributions for CasperJS?
Arch is very lightweight and has an easy to follow Beginners Guide. Arch's AUR has a package for CasperJS that's pretty straightforward to setup as well. Just make sure to grab the required base-devel package (pacman -S base-devel) before installing from the AUR as it's needed for the Arch Build System.
any specific recommendations on stripping down mainstream (CentOS, Debian, ...) distributions?
Not so much stripping down, but CrunchBang is based off of the latest Debian release. It may be worth taking a look at. It would be much less of a hassle to setup than Arch, and uses the same APT package manager as Debian / Ubuntu. It installs with the lightweight OpenBox window manager, but you can remove this and X all together if you'd like.
With that said, even a lightweight Linux environment won't help much with the amount of memory each CasperJS instance will use. You could probably pull off a few dozen depending on the amount of memory available, but a few hundred may not be feasible. It all depends on how much memory each website uses. Casperjs comes with some configuration options that may help reduce memory (e.g. don't load images, plugins, etc), but that may defeat the purpose of your tests.
The best advice I can give is to try it out for yourself. Write a simple script that will open the pages you are going to use and pass a callback to CasperJS's run() function to keep it alive (e.g. don't exit from Casper). It can be as simple as:
casper.start('http://example.com/site1', function () {});
casper.thenOpen('http://example.com/site2', function () {});
casper.run(function() {
// wait 60 seconds before exit . . . or remove to never exit
setTimeout(function() { casper.exit(); }, 60000);
}
Spin up multiple instances, and watch your total memory usage. You can use the cli tools top, or use this alias that totals the amount of memory usage for the current user.
alias memu="ps -u $(whoami) -o pid,rss,command | awk '{print \$0}{sum+=\$2} END {print \"Total\", sum/1024, \"MB\"}'"
From this you should be able to see roughly how much memory each instance takes, and how many you can run at once on one machine.
I am developing a MPI program on a Linux machine where I do not have sudo/su access. As my program currently segfaults, I would like to examine the core dumps via gdb. Unfortunately, as the program is multi-threaded, all the threads write to one core dump. So I would like to be able to append the PID to each separate core dump for every process.
I know there is a way to do it via /proc/sys/kernel/core_pattern, however I do not have access to write to this.
Thanks for any help.
It can be a pain to debug MPI apps on systems that are configured this way when you do not have root access. One option for working around this is to use Valgrind to get stack traces for your segfault(s). This will only be useful provided that your application will fail in a reasonable period of time when slowed down via Valgrind, and that it still segfaults at all in this case.
I usually run MPI apps under Valgrind like this:
% mpiexec -n 5 valgrind -q /path/to/my_app
That will send all of the Valgrind output to standard error. But if I want the output separated into different files, then you can get a bit fancier:
% mpiexec -n 5 valgrind -q --log-file='vg_out.%q{PMI_RANK}' /path/to/my_app
That's the setup for MPICH2. I think that for Open MPI you'll need to replace PMI_RANK with OMPI_MCA_ns_nds_vpid, but if that doesn't work for you then you'll need to check with the Open MPI developers on their discussion list. In either case, this will yield N files, where N is the size of MPI_COMM_WORLD, each named vg_out.0, vg_out.1, ..., to vg_out.$(($N-1)), each corresponding to a rank in MPI_COMM_WORLD.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am looking for general purpose programming languages that
have an interactive (live coding) prompt
work in 32 KB of RAM by itself or 8 KB when the compiler is hosted on a separate machine
run on a microcontroller with as little as 8-32 KB RAM total (without an MMU).
Below is my list so far, what am I missing?
Python: The PyMite VM needs 64K flash, 8K RAM. Targets LPC, SAM7 and ATmegas with 8K or more. Hosted.
Lua: The eLua FAQ recommends 256K flash, 64K RAM.
FORTH: amforth needs 8K flash, 150 bytes RAM, 30 bytes EEPROM on an ATmega.
Scheme: armpit Scheme The smallest target is the LPC2103 with 32K Flash, 4K SRAM.
C: Interactive C runs on 68HC11 with no flash and 32K SRAM. Hosted.
C: picoc an open source, cross-compiling, interactive C system. When compiled for AVR, it takes 63K flash, 8K RAM. The RAM could be reduced with effort to keep tables in flash.
C++: AngelScript an open source, byte-code based, C/C++ like scripting language with easy native calls.
Tcl: TinyTCL runs on DOS, 60K binary. Looks easy to port.
BASIC: TinyBasic: Initializes with a 64K heap, might be adjustable.
Lisp
PostScript: (I haven't found a FOSS implementation for low memory yet)
Shell: bitlash: An interactive command shell for Arduino (ATmega). See also AVRSH.
A homebrew Forth runtime can be implemented in very little memory indeed. I know someone who made one on a Cosmac in the 1970s. The core runtime was just 30 bytes.
I hear that CHIP-8, XPL0, PicoC, and Objective Caml have been ported to graphing calculators.
The Wikipedia "Lego Mindstorms" article lists a bunch of programming languages that allegedly run on the Lego RCX or Lego NXT platform.
Do any of them meet your "live coding" criteria?
You might want to check out the other microcontroller Forths at the Forth wiki . It lists at least 4 Forths for the Atmel AVR: amforth (which you already mention), PFAVR, avrforth, and ByteForth.
(Links to those interpreters, as well as this StackOverflow question, are included in the "Embedded Systems" wikibook).
I would recommend LUA (or eLUA http://www.eluaproject.net/ ). I've "ported" LUA to a Cortex-M3 a while back. From the top of my head it had a flash size of 60~100KB and needed about 20KB RAM to run. I did strip down to the bare essentials, but depending on your application, that might be enough. There's still room for optimization, especially about RAM requirements, but I doubt you can run it comfortable in 8KB.
Some AVR interpreters/VMs:
http://www.cqham.ru/tbcgroup/index_eng.htm
http://www.jcwolfram.de/projekte/avr/chipbasic2/main.php
http://www.jcwolfram.de/projekte/avr/chipbasic8/main.php
http://www.jcwolfram.de/projekte/avr/main.php
http://code.google.com/p/python-on-a-chip/
http://www.avrfreaks.net/index.php?module=Freaks%20Academy&func=viewItem&item_id=688&item_type=project
http://www.avrfreaks.net/index.php?module=Freaks%20Academy&func=viewItem&item_id=626&item_type=project
http://www.avrfreaks.net/index.php?module=Freaks%20Academy&func=viewItem&item_id=460&item_type=project
http://www.harbaum.org/till/nanovm/index.shtml
Wren fits your criteria -- by default it's configured to use just 4k of RAM. AFAIK it hasn't seen any actual use, since the guy I wrote it for decided he didn't need an interpreter running wholly on the target system after all.
The language is influenced most obviously by ML and Forth.
Have you considered a port in C of Tiny Basic? Or, perhaps rewriting the UCSD Pascal p-machine to your architecture from Z-80?
Seriously, though, JavaScript would make a good embedded scripting language, but I've no clue what the minimum memory requirements are for the VM + GC, nor how difficult to remove OS dependencies. I played with NJS a while back, which could possibly fit your needs. This one is interesting in that the compiler is written in JavaScript (self hosting).
You can take a look at very powerful AvrCo Multitasking Pascal for AVR. You can try it at http://www.e-lab.de. MEGA8/88 version is free. There are tons of drivers and simulator with JTAG debugger and nice live or simulated visualizations of all standard devices (LCDCHAR, LCDGRAPH, 7SEG, 14SEG, LEDDOT, KEYBOARD, RC5, SERVO, STEPPER...).
You're missing EmbedVM, homepage here, svn repo here. Remember to check out both [1,2] videos on the front page ;)
From the homepage:
EmbedVM is a small embeddable virtual machine for microcontrollers
with a C-like language frontend. It has been tested with GCC and AVR
microcontrollers. But as the Virtual machine is rather simple it
should be easy to port it to other architectures.
The VM simulates a 16bit CPU that can access up to 64kB of memory. It
can only operate on 16bit values and arrays of 16bit and 8bit values.
There is no support for complex data structures (struct, objects,
etc.). A function can have a maximum of 32 local variables and 32
arguments.
Besides the memory for the VM, a small structure holding the VM state
and the reasonable amount of memory the EmbedVM functions need on the
stack there are no additional memory requirements for the VM.
Especially the VM does not depend on any dymaic memory management.
EmbedVM is optimized for size and simplicity, not execution speed. The
VM itself takes up about 3kB of program memory on an AVR
microcontroller. On an AVR ATmega168 running at 16MHz the VM can
execute about 75 VM instructions per millisecond.
All memory accesses done by the VM are parformed using user callback
functions. So it is possible to have some or all of the VM memory on
external memory devices, flash memory, etc. or "memory-map" hardware
functions to the VM.
The compiler is a UNIX/Linux commandline tool that reads in a *.evm
file and generates bytecode in vaious formats (binary file, intel hex,
C array initializers and a special debug output format). It also
generates a symbol file that can be used to access data in the VM
memory from the host application.
The C-like language looks like this: http://svn.clifford.at/embedvm/trunk/examples/numberquizz/vmcode.evm
I would recommend MY-BASIC, runs with in minimum 8 KB RAM, and easy to port.
There's also JavaScript, via Espruino.
This is built specifically for Microcontrollers and there are builds for various different chips (mainly STM32s) that fit a full system into as little as 8kB RAM.
Have you considered simply using the /bin/sh supplied by busybox? Or on of the smaller scripting languages they recommend?
Prolog - http://www.gprolog.org/
According to a google search "prolog small" the size of the executable can be made quite small by avoiding linking the built-in predicates.
None of the languages in the list in the question or in the answers proved satisfactory for the requirement of super easy compilation and integration into an existing micro controller project (disclosure: I didn't actually try every single one of the suggestions).
I found instead tinyscript which is a single .c+.h file that compiled with the rest of the source files on my project with the only additional configuration required being to provide a void outchar(int c) which can be empty if you don't require output from the scripts.
For me speed of execution is far less important than ease of build and integration and interop with C, as my use case is mainly just calling some C functions in order.
I have been using in my previous work busybox on a BlackFin.
we compiled perl + php for it, after changing s/fork/vfork/g it worked pretty good... more or less. Not having an MMU is not a good idea. The memory fragmentation will kill the server pretty easily. All I did was:
for i in `seq 1 100`; do wget http://black-fin-ip/test.php; done
It died while I was walking to my boss and telling him that the server is going to die in production :)
I would suggest use python. But now the only problem is the memory overhead right? So I have great idea for people who may be stuck in this problem later on.
First thing's first, write a bf interpreter(or just get source code from somewhere). The interpreter will be really small. Also bf is a Turing complete language. Now you need to write your code in python and then transpiler it to bf using bfpy( https://github.com/felko/bfpy/blob/master/README.md ). I've given you the solution with the least overhead and I am pretty sure a bf interpreter will easily stay under 10KB of ram usage.
Erlang - http://erlang.org/
it can fit in 2MB
http://www.experts123.com/q/is-erlang-small-enough-for-embedded-systems.html