ESP8266 Fails to add char to a very long String (>8000 chars) - string

After correctly getting the payload from an HTTPS request, adding the client's chars to a String stops after about 8000 characters, then resumes and stops again a few times
Here's a snippet of my code:
long streamSize = 0;
Serial.println("Now reading payload...");
while (stream.connected()) {
while (stream.available() > 0) {
char ch = (char)stream.read();
Serial.println((String)"Reading [" + ++streamSize + "] " + ch);
ret += ch;
Serial.println(ret.length());
}
}
Which works fine, until:
Reading [8685] t
8685
Reading [8686] r
8686
Reading [8687] u
8687
Reading [8688] m
8687
Reading [8689]
8687
Reading [8690] e
8687
[Resumes correctly appending chars]
Reading [9226] i
8748
Reading [9227] p
8749
Reading [9228] t
8750
Reading [9229] i
8751
Reading [9230] o
8751
Reading [9231] n
8751
And so on for quite a few times.
Memory Heap size doesn't seem to be the issue, as I'm getting 14128 free bytes from system_get_free_heap_size() after appending everything.
I'm using a Wemos D1 R1, and this is the file I'm trying to fully read, for testing, using the Github API

I've found out that Arduino can fail to concatenate strings because of a low amount of free memory. Moreover, String class in Arduino seems not to have an error handler, so it can silently fail when - for example - memory is too fragmented.
See here: from Arduino forum
and here: from a discussion in Stack Overflow
In many cases they suggest that you could pre-allocate the buffer with a String reserve(int) call.
Maybe you could not know in advance how big your string will grow, but maybe you could managing it. For example, by calling two times your https target. First time just to know how big your answer will be (and so you will able to allocate the exact amount of memory); second time to effectively read.

Related

Why to print a string in interrupt driven IO, only the first character needs to be copied?

Almost all materials I found online referenced the code below from Tananbaum's OS book. However I don't really understand why this would print the whole string instead of only the first character.
Is it because the interrupts will be generated recursively? But wouldn't that cost a lot of resources? Or did I miss something?
I'm really confused. Any help would be appreciated.
Code executed when print system call is made:
copy_from_user (buffer, p, count);
enable_interrupts ();
while (*printer_status_reg !=READY);
*printer_data_register = p[0];
scheduler ();
Interrupt handler:
if (count == 0) {
unblock_user ();
} else {
*printer_data_register = p[i];
count = count – 1;
i++;
}
acknowledge_interrupt ();
return_from_interrupt ();
You write first character in buffer and start the transmission.
After completion of transmission, Tx_Complete interrupt will be generated.
Now, your interrupt handler checks, whether there are any more bytes to transfer (The else part). If available, it adds next byte to transmit register, decrements number of bytes to transmit and increments buffer index.
This process goes on... When number of bytes to transmit reaches zero, you don't initiate next transfer and your interrupts stop.
By transferring first byte, you initiate the process and rest bytes are transferred by interrupt handler. You have to make sure that count is correct.
You can guess what can happen if count is less or more!

Alsalib mmap direct write

I am just messing around with ALSA library and can't really figure out how to do playback with a direct write.
I am using SND_PCM_ACCESS_MMAP_INTERLEAVED.
I am trying to write a square wave.
I created a buffer of shorts to hold the square wave. I have tested it with snd_pcm_writei and it works.
I then call snd_pcm_begin and use the pointers given from area to write to the device:
while(1)
{
int msg;
frames_available = snd_pcm_avail_update(handle);
snd_pcm_mmap_begin(handle,&areas,&offset,&limit_frames);
frames_to_write = frames; //frames is the size of the buffer in frames
if (frames_to_write > limit_frames)
frames_to_write = 0;
int offset_frames = (areas[0].first + offset*areas[0].step)/16;
short* write_ptr = (short*)areas[0].addr + offset_frames;
// fill the buffer with stuff
for(int i =0; i < frames_to_write;i++)
{
write_ptr[i] = buffer[i];
}
msg = snd_pcm_mmap_commit(handle,offset,frames_to_write);
}
The sound produced is choppy and gets cut off soon after. It gets cut off because the limit_frame reaches 0. I notice that limit_frames stays at 0 even if there are frames_available.
EDIT:
I used memcpy() instead of a for loop and that solved the choppiness. Still gets cut off though. Now I'm curious why memcpy() solves the choppiness. Shouldn't the for loop and memcpy and for loop copy over the memory contiguously?
Using mmap does not make sense if all you're doing is copying the samples from another buffer; that's exactly the same what snd_pcm_writei() does.
Anyway, before calling snd_pcm_mmap_begin(), you must set its last parameter to the number of frames you intend to write, and when it returns a smaller number, you should write that number, instead of 0.
When you have more than one channel, a frame is larger than one sample.

why it's slowly when I parse a message of Google protocol buffer in multi-thread?

I try to parse many Google protocol buffer messages from a binary file generated by calling SerializeToString. I first load all Bytes into a heap memory by calling new function. I also have two arrays to store the Bytes begin address of a message in the heap memory and the Bytes count of the message.
Then I begin to parse message by calling ParseFromString.I want to quicken the procedure by using multi-thread.
In each thread, I pass the start index and end index of address array and Byte count array.
In parent process. the main code is:
struct ParsePara
{
char* str_buffer;
size_t* buffer_offset;
size_t* binary_string_length_array;
size_t start_idx;
size_t end_idx;
Flight_Ticket_Info* ticket_info_buffer_array;
};
//Flight_Ticket_Info is class of message
//offset_size is the count of message
ticket_array = new Flight_Ticket_Info[offset_size];
const int max_thread_count = 6;
pthread_t pthread_id_vec[max_thread_count];
CTimer thread_cost;
thread_cost.start();
vector<ParsePara*> para_vec;
const size_t each_count = ceil(float(offset_size) / max_thread_count);
for (size_t k = 0;k < max_thread_count;k++)
{
size_t start_idx = each_count * k;
size_t end_idx = each_count * (k+1);
if (start_idx >= offset_size)
break;
if (end_idx >= offset_size)
end_idx = offset_size;
ParsePara* cand_para_ptr = new ParsePara();
if (!cand_para_ptr)
{
_ERROR_EXIT(0,"[Malloc memory fail.]");
}
cand_para_ptr->str_buffer = m_valdata;//heap memory for storing Bytes of message
cand_para_ptr->buffer_offset = offset_array;//begin address of each message
cand_para_ptr->start_idx = start_idx;
cand_para_ptr->end_idx = end_idx;
cand_para_ptr->ticket_info_buffer_array = ticket_array;//array to store message
cand_para_ptr->binary_string_length_array = binary_length_array;//Bytes count of each message
para_vec.push_back(cand_para_ptr);
}
for(size_t k = 0 ;k < para_vec.size();k++)
{
int ret = pthread_create(&pthread_id_vec[k],NULL,parserFlightTicketForMultiThread,para_vec[k]);
if (0 != ret)
{
_ERROR_EXIT(0,"[Error] [create thread fail]");
}
}
for (size_t k = 0;k < para_vec.size();k++)
{
pthread_join(pthread_id_vec[k],NULL);
}
In each thread the thread function is:
void* parserFlightTicketForMultiThread(void* void_para_ptr)
{
ParsePara* para_ptr = (ParsePara*) void_para_ptr;
parserFlightTicketForMany(para_ptr->str_buffer,para_ptr->ticket_info_buffer_array,para_ptr->buffer_offset,
para_ptr->start_idx,para_ptr->end_idx,para_ptr->binary_string_length_array);
}
void parserFlightTicketForMany(const char* str_buffer,Flight_Ticket_Info* ticket_info_buffer_array,
size_t* buffer_offset,const size_t start_idx,const size_t end_idx,size_t* binary_string_length_array)
{
printf("start_idx:%d,end_idx:%d\n",start_idx,end_idx);
for (size_t k = start_idx;k < end_idx;k++)
{
if (k % 100000 == 0)
cout << k << endl;
size_t cand_offset = buffer_offset[k];
size_t binary_length = binary_string_length_array[k];
ticket_info_buffer_array[k].ParseFromString(string(&str_buffer[cand_offset],binary_length-1));
}
printf("done %ld %ld\n",start_idx,end_idx);
}
But multi-thread cost is more than one thread.
one thread cost is:40455623ms
My computer is 8 core and six thread cost is:131586865ms
Anyone can help me? thank you!
Some possible problems -- you'll have to experiment to determine which:
Protobuf parsing speed is often limited by memory bandwidth rather than CPU time, especially with a large input data set. In that case, more threads won't help, since all the cores are sharing bandwidth to main memory. Indeed, having multiple cores fighting over memory bandwidth could make the overall operation slower. Note that the biggest consumer of memory is not the input bytes but rather the parsed data objects -- that is, the output of parsing -- which are many times larger than the encoded data. To improve this problem, consider writing the parsing loop so that it fully-processes each message immediately after parsing, before moving on to the text message. That way, instead of allocating k protobuf objects, you only need to allocate one protobuf object per thread, and repeatedly reuse the same object for parsing. This way the object will (probably) stay in the core's private L1 cache and avoid consuming memory bandwidth; only the input bytes will be read over the main bus.
How are you loading data into RAM? Did you read() into a large array or did you mmap()? In the latter case the data is read from disk lazily -- it won't happen until you actually attempt to parse it. Even in the read() case, it could be that the data has been swapped out, creating similar effects. Either way, your threads are now not just fighting for memory bandwidth, but disk bandwidth, which is of course much slower. Having six threads reading separate parts of a big file will definitely be slower overall than having one thread read the whole file, because the operating system optimizes for sequential access.
Protobuf allocates memory during parsing. Many memory allocators take a lock while allocating new memory. Since all your threads are allocating tons and tons of objects in a tight loop, they will contend for this lock. Make sure you are using a thread-friendly memory allocator, such as Google's tcmalloc. Note that repeatedly reusing the same protobuf object in a parse-consume loop rather than allocating lots of different objects will also help immensely here, because the protobuf object will automatically reuse memory for sub-objects.
There may be a bug in your code and it might not be doing what you expect at all when multithreaded. For example, a bug might be causing all the threads to process the same data, rather than different data, and it could be that the data they're choosing happens to be bigger. Make sure you are testing that the results of your code are exactly the same when you run single-threaded vs. multi-threaded.
In short, if you want multiple cores to make your code faster, you have to think about not just what each core is doing, but what data is going in and out of each core, and how much the cores have to talk to each other. Ideally you want each core to operate all on its own without talking to anyone or anything; then you get maximum parallelism. That's not usually possible, of course, but the closer you can get to that, the better.
BTW, a random optimization for you:
ParseFromString(string(&str_buffer[cand_offset],binary_length-1))
Replace that with:
ParseFromArray(&str_buffer[cand_offset],binary_length-1)
Creating at std::string makes a copy of the data, which wastes time (and memory bandwidth). (This doesn't explain why threading is slow, though.)

Controlling TI OMAP l138 frequency leads to "Division by zero in kernel"

My team is trying to control the frequency of an Texas Instruments OMAP l138. The default frequency is 300 MHz and we want to put it to 372 MHz in a "complete" form: we would like not only to change the default value to the desired one (or at least configure it at startup), but also be capable of changing the value at run time.
Searching on the web about how to do this, we found an article which tells that one of the ways to do this is by an "echo" command:
echo 372000 /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
We did some tests with this command and it runs fine with one problem: sometimes the first call to this echo command leads to a error message of "Division by zero in kernel":
In my personal tests, this error appeared always in the first call to the echo command. All the later calls worked without error. If, then, I reset my processor and calls the command again, the same problem occurs: the first call leads to this error and later calls work without problem.
So my questions are: what is causing this problem? And how could I solve it? (Obviously the answer "always type it twice" doesn't count!)
(Feel free to mention other ways of controlling the OMAP l138's frequency at real time as well!)
Looks to me like you have division by zero in davinci_spi_cpufreq_transition() function. Somewhere in this function (or in some function that's being called in davinci_spi_cpufreq_transition) there is a buggy division operation which tries to divide by some variable which is (in your case) has value of 0. And this is obviously error case which should be handled properly in code, but in fact it isn't.
It's hard to tell which code exactly leads to this, because I don't know which kernel you are using. It would be much more easier if you can provide link to your kernel repository. Although I couldn't find davinci_spi_cpufreq_transition in upstream kernel, I found it here.
davinci_spi_cpufreq_transition() function appears to be in drivers/spi/davinci_spi.c. It calls davinci_spi_calc_clk_div() function. There are 2 division operations there. First is:
prescale = ((clk_rate / hz) - 1);
And second is:
if (hz < (clk_rate / (prescale + 1)))
One of them is probably causing "division by zero" error. I propose you to trace which one is that by modifying davinci_spi_calc_clk_div() function in next way (just add lines marked as "+"):
static void davinci_spi_calc_clk_div(struct davinci_spi *davinci_spi)
{
struct davinci_spi_platform_data *pdata;
unsigned long clk_rate;
u32 hz, cs_num, prescale;
pdata = davinci_spi->pdata;
cs_num = davinci_spi->cs_num;
hz = davinci_spi->speed;
clk_rate = clk_get_rate(davinci_spi->clk);
+ printk(KERN_ERR "### hz = %u\n", hz);
prescale = ((clk_rate / hz) - 1);
if (prescale > 0xff)
prescale = 0xff;
+ printk("### prescale + 1 = %u\n", prescale + 1UL);
if (hz < (clk_rate / (prescale + 1)))
prescale++;
if (prescale < 2) {
pr_info("davinci SPI controller min. prescale value is 2\n");
prescale = 2;
}
clear_fmt_bits(davinci_spi->base, 0x0000ff00, cs_num);
set_fmt_bits(davinci_spi->base, prescale << 8, cs_num);
}
My guess -- it's "hz" variable which is 0 in your case. If it's so, you also may want to add next debug line to davinci_spi_setup_transfer() function:
if (!hz)
hz = spi->max_speed_hz;
+ printk(KERN_ERR "### setup_transfer: setting speed to %u\n", hz);
davinci_spi->speed = hz;
davinci_spi->cs_num = spi->chip_select;
With all those modifications made, rebuild your kernel and you will probably get the clue why you have that "div by zero" error. Just look for lines started with "###" in your kernel boot log. In case you don't know what to do next -- attach those debug lines and I will try to help you.

CString::Format() causes debug assertion

Cstring::Format causes debug assertion in visual studio 2008 at vsprintf.c line 244 with "buffer too small".
//inside the function.
somefile.Open (//open for mode read) //somefile is CFile.
char* buff = new [somefile.GetLength()];
somefile.Read ((void*)buff, somefile.GetLength());
CString cbuff;
cbuff.Format ("%s",buff); //this line causes the debug assertion.
//and so on
Any idea why CString::Format() causes "buffer too small" error ? This doesn't always get debug assertion error.
An alternate solution is:
somefile.Open (//open for mode read) //somefile is CFile.
int buflen = somefile.GetLength();
CString cbuff;
somefile.Read ((void*)cbuff.GetBuffer(buflen), buflen);
cbuff.ReleaseBuffer();
It reads directly into a string buffer instead of the intermediate variable. The CString::GetBuffer() function automatically adds the extra byte to the string which you forgot to do when you allocated the "new char[]".
string end with '\0'
so buffer size will not be enough
The problem is that CFile::Read() does not guarantee that it reads as much data as you ask for. Sometimes it's reading less and leaving your buffer without a null terminator. You have to assume that you might only get one byte on each read call. This will also crash sometimes, when an un-readable memory block immediately follows your buffer.
You need to keep reading the file until you get to the end. Also, the null terminator is generally not written to the file at all, so you shouldn't assume that it will be read in but rather ensure that your buffer is always null-terminated no matter what is read.
In addition, you shouldn't use the file size as the buffer size; there's no reason to think you can read it all in at once, and the file size might be huge, or zero.
You should also avoid manual memory management, and instead of new[]/delete[], use a vector, which will ensure that you don't forget to free the buffer or use delete instead of delete[], and that the memory is released even in case of an exception. (I wouldn't recommend using CString or CFile either, for that matter, but that's another topic...)
// read from the current file position to the end of
// the file, appending whatever is read to the string
CString ReadFile(CFile& somefile, CString& result)
{
std::vector<char> buffer(1024 + 1);
for (;;)
{
int read = somefile.Read(&buffer[0], buffer.size() - 1);
if (read > 0)
{
// force a null right after whatever was read
buffer[read] = '\0';
// add whatever was read to the result
result += &buffer[0];
}
else
{
break;
}
}
}
Note that there's no error handling in this example.

Resources