performance metric of a software library in WinCE - windows-ce

Is there a way to calculate how many mips consumed by a module (libraray) running on wince? We are developing an audio library and want to give playback in wince (Renesas SH-4A core). How to calculate the number of mips (millions of instructions per second) consumed by this library? If not, what is the equivalent metric for measuring the performance of a library on wince. - Thank you, VT.

Closest you would come is using a high resolution timer:
http://msdn.microsoft.com/en-us/library/aa964692(VS.80).aspx
Also check out this msdn article:
http://msdn.microsoft.com/en-us/library/ms404355.aspx

Related

Using libMPSSE with FT4232H in Linux

I don't have much experience with the FT4232H.
It is quad a port device.
I want to configure some of the ports as SPI, other as UART and GPIO
I have started experimenting with the SPI .
In the official documentation it says that if ftd2xx and libMPSSE are used
one have to remove the standard kernel modules ftdi_sio and usbserial so I did it.
My question:
In Linux can I have all 4 port configured arbitrary as SPI/UART/I2C/GPIO
as each port uses possibly different driver in the PC?
As far as I know the driver is loaded based on the PID/VID.
It is unique for the FT4232H which has 4 ports, so how this can be done?
Any thought about this is welcome.
I also want to share my current (one day) experience with the libMPSSE.
Unfortunately kind of negative.
I have downloaded the source code for the latest libMPSSE-rev0.6 from the official FTDI web site.
I was surprised to see that no provisioning for compilation under Linux.
There is no Makefile for the libMPSSE. I have made a simple one which is not a big deal but then I found out it does not compile out of the box. I got "undefined type byte" at one location.
After building the library I have done a simple test application which behaved strange.
I was sending 8 bytes on the SPI but with the oscilloscope I was observing 7 8bit clock packets then 100us pause and finally the last 8 bit packet. The MOSI also was incorrect in the last packet.(I was sending all 0 but got two bits high at the end)
Luckily I found https://www.mathworks.com/matlabcentral/answers/518039-ftdi-libmpsse-0-6-spi_readwrite-weird-behaviour-loadlibrary-calllib .
Fixing that made my transfer looks OK.
It looks to me FTDI have not done even a basic check of libMPSSE
I can not understand how FTDI so popular chips can have that low quality software library.
Probably ftd2xx is OK and issues are only in the higher layer libMPSSE?
Anyone using libMPSSE? Should I expect more issues?
Any thoughts about FTDI solution stability is welcome.
Thanks
Dimitar
First of all:
In my understanding, large part of your "question" is not really a question but a "discussion starting point" which is not the scope of the "stack-..." ecosystem.
To the driver question part:
To the best of my knowledge, the driver is loaded once per device and not per port. However, the ftd2xx is able to address each port individually and set them to different modes (basically it simple gives the specific orders to the MPSSE engine which does not really care about the real meaning of the given sequences). Only thing you lose is the convinent VCP accessability.
The different libMPSSE modes have their limitations and quirks. A website search will yield more than enough as a starting point for more narrowed down follow-up questions. If you are only interested in one mode, an IC dedicated to only this mode might be an alternative to the relatively "generic" FTx232x serie.

Can a high frequency hardware input be accurately read under a windows or windows CE environment?

This question straddles the blurry line between software and hardware so I hope the community feels it belongs on stackoverflow otherwise I would appreciate some advice as to where to move it.
My team is currently in the process of developing a robotic controller and although most of our I/O requirements are pretty tame we have one specific input that occasionally triggers at a 2 microsecond interval (Either a digital rise or fall). We need to be able to accurately timestamp that input and record it. We do not need to act on it in any sort of immediate way. We feel as if we should be able to take advantage of a high speed USB or PCI-E bus in order to facilitate this kind of high speed input but we lack expertise in this area and do not know what a solution to this problem would look like or if it even can be achieved without expensive/very invovled proprietary solutions. In addition there is a desire in our group to move away from Windows CE to a Windows Embedded (probably XP) x86 based platform however if this is something that could only be achieved on a CE system that would be important to know.
In summary:
Is there a way to read and timestamp accurately a 2 microsecond (500 KHz) input under a windows or windows CE environment?
Bonus points for implementation ideas/visions/methods!
Thank you!
I wouldnt try to do it directly in windows? What GPIO do you have anyway on your windows machine? For around or under $20 there are a long list of solutions that are either serial or usb based (of the shelf microcontroller boards) which you could program to have their only job to be to watch that I/O and time stamp it and then casually send this data over usb or serial (or other) to the host.
I am thinking along the lines of the msp430 launchpad, or stellaris lauchpad, or stm32f0 discovery or stm32f4 discovery, the family of arduinos, mbed, leaf labs maple mini, and on and on.
To do it properly under windows you would need to add some hardware anyway...

Linux CPU Usage Tools

Background
I've written a tool to capture CPU usage on a per/thread basis. The output of the tools is a binary file, that I can pump into my parsing utility that I wrote. And the output of the parsing utility is a CSV file that I can import into Excel to chart pretty graphs of process/thread CPU usage.
This CPU usage capture tool is running on an embedded ARM platform running a Linux kernel based on 2.6.35.3. That being said, I was concerned about making the tool light weight. I didn't want it to store directly to a CSV file, in order to minimize the processing time and the file size of the captured data.
Question
The tool works, but I'm wondering if I took the long way around the problem? Is there already a tool out there that does this (or something like it)?
You're probably wondering why I care if I already made a tool that works. Well, it's not as light weight as I'd like. It's taking up about 10% of CPU usage. As a benchmark, top only takes up about 1% (max).
Update
I've decided to continue using my tool for now. At least until a better solution becomes available. I was able to shave off a couple percentage points by using open() instead of fopen() on /proc/stat. I'm also using read() instead of fgets().
IBM has a tool called nmon which does the same(for AIX & Linux): According to IBM's documentation, it takes ~2% CPU. You may want to look at that.
Comparing nmon with your tool could give you a fair idea about your program's performance and how you may improve your csv capture.
This might be a bit of a steep learning curve, but you might want look into SystemTap: http://sourceware.org/systemtap/

How can I count instructions executed on Red Hat Enterprise Linux (x86-64)?

I want to find out how many x86-64 instructions are executed during a given run of a program running on Red Hat Enterprise Linux. I know I can get this information from valgrind but the slowdown is considerable. I also know that we are using Intel Core 2 Quad CPUs (model Q6700) which have hardware performance counters built in. But I don't know of any way to get access to the total number of instructions executed from within a C program.
Performance Application Programming Interface (PAPI) appears to be along the lines of what you are looking for.
From the website:
PAPI aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors.
Vince Weaver, a Post Doctoral Research Associate with the Innovative Computing Laboratory at the University of Tennessee, did some PAPI-related work. The research listed on his web page at UTK looks like it may provide some additional information.
libpapi is the library you are looking for.
AMD and Intel chips provide the insn counts.
The program below access to cycles counter register from C (sorry non portable code, but works fine with gcc). This one is for counting cycles, that is not the same thing as instructions. Modern processors can both use several cycles on the same instruction, or execute several instructions at once. Cycles is usually more interresting that number of instructions, but it depends of your actual purpose.
Other performances counter can certainly be accessed the same ways (actually I don't even know if there is others), but I will have to look for the actual instruction code to use.
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
There are a couple of ways you could go about it, depending on exactly what you need. If you just want to find out the total number of potential arguments you could just run objdump on the binary, which will give you the assembly. If you want more detailed information about the actual instructions being hit on a given run-through of the program, you may want to look into DynamoRIO which provides that functionality. It is similar to valgrind, but I believe it has a smaller affect on performance. I was able to throw together a basic instruction counter with it back in September relatively quickly and easily.
If that's no good, you could try checking out PAPI, which is an API that should let you get at the performance counters on your processors. I've never used it, so I can't speak for it, but a friend of mine used it in a project about 6 months ago and said he found it very helpful.

Fast Audio Input/Output

Here's what I want to do:
I want to allow the user to give my program some sound data (through a mic input), then hold it for 250ms, then output it back out through the speakers.
I have done this already using Java Sound API. The problem is that it's sorta slow. It takes a minimum of about 1-2 seconds from the time the sound is made to the time the sound is heard again from the speakers, and I haven't even tried to implement delay logic yet. Theoretically there should be no delay, but there is. I understand that you have to wait for the sound card to fill up its buffer or whatever, and the sample size and sampling rate have something to do with this.
My question is this: Should I continue down the Java path trying to do this? I want to get the delay down to like 100ms if possible. Does anyone have experience using the ASIO driver with Java? Supposedly it's faster..
Also, I'm a .NET guy. Does this make sense to do with .NET instead? What about C++? I'm looking for the right technology to use here, and maybe a good example of how to read/write to audio input/output streams using your suggested technology platform. Thanks for your help!
I've used JavaSound in the past and found it wonderfully flaky (and it keeps changing between VM releases). If you like C#, use it, just use the DirectX APIs. Here's an example of doing kind of what you want to do using DirectSound and C#. You could use the Effects plugins to perform your 250 ms echo.
http://blogs.microsoft.co.il/blogs/tamir/archive/2008/12/25/capturing-and-streaming-sound-by-using-directsound-with-c.aspx
You may want to look into JACK, an audio API designed for low-latency sound processing. Additionally, Google turns up this nifty presentation [PDF] about using JACK with Java.
Theoretically there should be no delay, but there is.
Well, it's impossible to have zero delay. The best you can hope for is an unnoticeable delay (in terms of human perception). It might help if you describe your basic algorithm for reading & writing the sound data, so people can identify possible problems.
A potential issue with using a garbage-collected language like Java is that the GC will periodically run, interrupting your processing for some arbitrary amount of time. However, I'd be surprised if it's >100ms in normal usage. If GC is a problem, most JVMs provide alternate collection algorithms you can try.
If you choose to go down the C/C++ path, I highly recommend using PortAudio ( http://portaudio.com/ ). It works with almost everything on multiple platforms and it gives you low-level control of the sound drivers without actually having to deal with the various sound driver technology that is around.
I've used PortAudio on multiple projects, and it is a real joy to use. And the license is permissive.
If low latency is your goal, you can't beat C.
libsoundio is a low-level C library for real-time audio input and output. It even comes with an example program that does exactly what you want - piping the microphone input to the speakers output.
It's possible with JavaSound to get end-to-end latency in the ballpark of 100-150ms.
The primary cause of latency is the buffer sizes of the capture and playback lines. The bufferSize is set when opening the lines:
capture: TargetDataLine#open(AudioFormat format, int bufferSize)
playback: SourceDataLine#open(AudioFormat format, int bufferSize)
If the buffer is too big it will cause excess latency, but if it's too small it will cause stuttery playback. So you need to find a balance for your applications needs and your computing power.
The default buffer size can be checked with DataLine#getBufferSize when calling #open(AudioFormat format). The default size will vary based on the AudioFormat and seems to be geared for high latency, stutter free playback applications (e.g. internet streaming). If you're developing a low latency application, the default buffer size is much too large and should be changed.
In my testing with a 16-bit PCM AudioFormat, a buffer size of 1024 bytes has been pretty close to ideal for low latency.
The second and often overlooked cause of audio latency is any other activity being done in the capture or playback threads. For example, logging messages to console can introduce 10's of ms of latency. Turn it off.

Resources