SystemC: channels vs port value update - multithreading

While working on a SystemC project, I discovered that probably I have some confused ideas about signals and ports. Let's say I have something like this:
//cell.hpp
SC_MODULE(Cell)
{
sc_in<sc_uint<16> > datain;
sc_in<sc_uint<1> > addr_en;
sc_in<sc_uint<1> > enable;
sc_out<sc_uint<16> > dataout;
SC_CTOR(Cell)
{
SC_THREAD(memory_cell);
sensitive << enable << datain << addr_en;
}
private:
void memory_cell();
};
//cell.cpp
void Cell::memory_cell()
{
unsigned short data_cell=11;
while(true)
{
//wait for some input
wait();
if (enable->read()==1 && addr_en->read()==1)
{
data_cell=datain->read();
}
else
{
if(enable->read()==0 && addr_en->read()==1)
{
dataout->write(data_cell);
}
}
}
}
//test.cpp
SC_MODULE(TestBench)
{
sc_signal<sc_uint<1> > address_en_s;
sc_signal<sc_uint<16> > datain_s;
sc_signal<sc_uint<1> > enable_s;
sc_signal<sc_uint<16> > dataout_s;
Cell cella;
SC_CTOR(TestBench) : cella("cella")
{
// Binding
cella.addr_en(address_en_s);
cella.datain(datain_s);
cella.enable(enable_s);
cella.dataout(dataout_s);
SC_THREAD(stimulus_thread);
}
private:
void stimulus_thread() {
//write a value:
datain_s.write(81);
address_en_s.write(1);
enable_s.write(1);
wait(SC_ZERO_TIME);
//read what we have written:
enable_s.write(0);
address_en_s.write(1);
wait(SC_ZERO_TIME);
cout << "Output value: " << dataout_s.read() << endl;
//let's cycle the memory again:
address_en_s.write(0);
wait(SC_ZERO_TIME);
cout << "Output value: " << dataout_s.read() << endl;
}
};
I've tried running this modules and I've noticed something weird (at least, weird for me): when the stimulus writes a value (81), after the wait(SC_ZERO_TIME) the memory thread finds its datain, enable and address_enable values already updated. This is what I expected to happen. The same happens when the stimulus changes the enable_es value, in order to run another cycle in the memory thread and copy the data_cell value into the memory cell dataout port. What I don't understand is why after the memory module writes into its dataout port and goes again to the wait() statement at the beginning of the while loop, the stimulus module still has the old value on its dataout_s channel (0), and not the new value(81), which has just been copied by the memory module. Then, if I run another cycle of the memory loop (for example changing some values on the stimulus channels), the dataout channel finnally updates.
In other words, it looks like that if I write into the stimulus channels and then switch to the memory thread, the memory finds the values updated. But if the memory thread writes into its ports, and then I switch to the stimulus thread, the thread still sees the old values on its channels (binded to the memory ports).

The example above is not working as I expected because of a wrong delta cycle synchronization.
Generally speaking, lets suppose we have two threads running on two modules, A and B, connected through a channel. If I write something in threadA during delta cycle number 1, it will only be available in thread B during delta cycle 2. And if thread B writes something during its delta cycle 2, thread A has to wait until delta cycle 3 in order to read it.
Being aware of this, stimulus thread would need two consecutive wait(SC_ZERO_TIME) statements in order to read the correct output from the memory, because it has to forward its delta value.

Related

Memory not be freed on Mac when vector push_back string

Code as below, found that when vector push_back string on a Mac demo app, memory not be freed. I thought the stack variable will be freed when out of function scope, am I wrong? Thanks for any tips.
in model.h:
#pragma once
namespace NS {
const uint8_t kModel[8779041] = {4,0,188,250,....};
}
in ViewController.mm:
- (void)start {
std::vector<std::string> params = {};
std::string strModel(reinterpret_cast<const char *>(NS::kModel), sizeof(NS:kModel));
params.push_back(strModel);
}
The answer to your question depends on your understanding of the the "free" memory. The behaviour you are observing can be reproduced as simple as with a couple lines of code:
void myFunc() {
const auto *ptr = new uint8_t[8779041]{};
delete[] ptr;
}
Let's run this function and see how the memory consumption graph changes:
int main() {
myFunc(); // 1 MB
std::cout << "Check point" << std::endl; // 9.4 MB
return 0;
}
If you put one breakpoint right at the line with myFunc() invocation and another one at the line with "Check point" console output, you will witness how memory consumption for the process jumps by about 8 MB (for my system and machine configuration Xcode shows sudden jump from 1 MB to 9.4 MB). But wait, isn't it supposed to be 1 MB again after the function, as the allocated memory is freed at the end of the function? Well, not exactly.. The system doesn't regain this memory right away, because it's not that cheap operation to begin with, and if your process requests the same amount memory 1 CPU cycle later, it would be quite a redundant work. Thus, the system usually doesn't bother shrinking memory dedicated to a process either until it's needed for another process, and until it runs out of available resources (it also can be some kind of fixed timer, but overall I would say this is implementation-defined). Another common reason the memory is not freed, is because you often observe it through debug mode, where the memory remains dedicated to the process to track some tricky scenarios (like NSZombie objects, which address needs to remain accessible to the process in order to report the use-after-free occasions).
The most important here is that internally, the process can differentiate between "deleted" and "occupied" memory pages, thus it can re-occupy memory which is already deleted. As a result, no matter how many times you call the same function, the memory dedicated to the process remains the same:
int main() {
myFunc(); // 1 MB
std::cout << "Check point" << std::endl; // 9.4 MB
for (int i = 0; i < 10000; ++i) {
myFunc();
}
std::cout << "Another point" << std::endl; // 9.4 MB
return 0;
}

How to suspend all other threads inside a separate class function c++

I am working on a final project for a class. This project is to mimic multiple atm's. That is my program already runs. Inside of my main.cpp, I created the threads, for now just two, later on maybe more, They call a class Begin that rand() if customers are going to make a deposit or withdraw and then rand() the amount they are going to use and does this 5 times.
#include "ATM.h"
void main()
{
Begin test1;
test1.manager();
thread first(&Begin::atm, test1);
thread second(&Begin::atm, test1);
first.join();
second.join();
delete resbox::cashbox;
system("pause");
}
I cannot figure out how to suspend my threads created in Main.cpp inside of my observe() function like so:
void watcher::observe()
{
float cash;
if (resbox::cashbox->gettotal() >= resbox::cashbox->getmax())
{
//suspend all other threads
cout << "Please empty cash box it is full! with $"<< resbox::cashbox->gettotal() << endl;
cout << "How much would like to withdraw?" << endl;
cin >> cash;
resbox::cashbox->cashwd(cash);
cout << "This is the amount in the reserve box now is $" << resbox::cashbox->gettotal() << endl;
//resume all other threads
}
if (resbox::cashbox->gettotal() <= 500)
{
//suspend all other threads
cout << "Please fill cashbox it is low, has $" << resbox::cashbox->gettotal() << endl;
cout << "How much would like to add?" << endl;
cin >> cash;
resbox::cashbox->cashdp(cash);
cout << "This is the amount in the reserve box now $" << resbox::cashbox->gettotal() << endl;
//resume all other threads
}
}
Whenever the condition is met for one of the if statements I need to be able to suspend all other threads except the current thread that met the condition. Then after the data is completed before leaving the if statement and observer functions resume all other threads.
I read about the possibility of using SuspendThread, and ResumeThread from here, how to suspend thread. Yet I am having a hard time passing the threads created in main.cpp to the observer function so that I could call those functions. I figured out how to create threads from cplusplus.com, I also notice I could potentially use a mutex locking as refered to from What is the best solution to pause and resume pthreads?
I am using c++ under Microsoft Visual Studio 2015 Community.
This is my first time dealing with threads. For my use which is better, pass the created threads to the observer function, or is there another to pause/suspend and then resume them and how would i do so? Thank you for any advice/help provided.
Currently If I run my program and one of the conditions is met by a thread, the other thread will also meet the same condition and I have to enter the amount to deposit/withdraw twice before the threads continue until each thread has dealt with 5 customers each for a total of 10 customers.
I finally figured out what I needed and what to use thanks to:
Class RWLock
By utilizing this class, inside my project. Then creating a global instance of that class.
Then I added the reader and writer lock and unlocks where it function inside my code the best. Like so:
void Begin::atm() //The main function that makes it easier for threads to
call and run the Program.
{
ATM atm;
int choice, amount;
LARGE_INTEGER cicles;
QueryPerformanceCounter(&cicles);
srand(cicles.QuadPart);
for (int i = 0; i < imax; i++) //mimics a total of 5 customers
{
rw.ReadLock(); //Have to place to read lock here.
choice = rand() % 2; //Randomizes the choice of depositing or withdrawing.
amount = rand() % 5000 + 1; //Randomizes 'the amount of cash that the customers use.
rw.ReadUnlock(); //Read unlock must happen here otherwise it blocks the writers.
rw.WriteLock(); //Must happen here!
if (choice == 0)
{
atm.cashdp(amount);
cout << "\tCustomer depositing $" << amount << endl;
}
else if (choice == 1)
{
atm.cashwd(amount);
cout << "\tCustomer withdrawing $" << amount << endl;
}
else
//error checker against the randomizer for the choice of depsoiting or withdrawing.
cout << "error rand creating wrong number" << endl;
rw.WriteUnlock(); //Must Happen here!
Sleep(5000); // Sleeps the program between customer usage to mimic actual use.
}
}

Parallel ray tracing in 16x16 chunks

My ray tracer is currently multi threaded, I'm basically dividing the image into as many chunks as the system has and rendering them parallel. However, not all chunks have the same rendering time, so most of the time half of the run time is only 50% cpu usage.
Code
std::shared_ptr<bitmap_image> image = std::make_shared<bitmap_image>(WIDTH, HEIGHT);
auto nThreads = std::thread::hardware_concurrency();
std::cout << "Resolution: " << WIDTH << "x" << HEIGHT << std::endl;
std::cout << "Supersampling: " << SUPERSAMPLING << std::endl;
std::cout << "Ray depth: " << DEPTH << std::endl;
std::cout << "Threads: " << nThreads << std::endl;
std::vector<RenderThread> renderThreads(nThreads);
std::vector<std::thread> tt;
auto size = WIDTH*HEIGHT;
auto chunk = size / nThreads;
auto rem = size % nThreads;
//launch threads
for (unsigned i = 0; i < nThreads - 1; i++)
{
tt.emplace_back(std::thread(&RenderThread::LaunchThread, &renderThreads[i], i * chunk, (i + 1) * chunk, image));
}
tt.emplace_back(std::thread(&RenderThread::LaunchThread, &renderThreads[nThreads-1], (nThreads - 1)*chunk, nThreads*chunk + rem, image));
for (auto& t : tt)
t.join();
I would like to divide the image into 16x16 chunks or something similar and render them paralelly, so after each chunk gets rendered, the thread switches to the next and so on... This would greatly increase cpu usage and run time.
How do I set up my ray tracer render these 16x16 chunks in a multithreaded manner?
I assume the question is "How to distribute the blocks to the various threads?"
In your current solution, you're figuring out the regions ahead of time and assigning them to the threads. The trick is to turn this idea on its head. Make the threads ask for what to do next whenever they finish a chunk of work.
Here's an outline of what the threads will do:
void WorkerThread(Manager *manager) {
while (auto task = manager->GetTask()) {
task->Execute();
}
}
So you create a Manager object that returns a chunk of work (in the form of a Task) each time a thread calls its GetTask method. Since that method will be called from multiple threads, you have to be sure it uses appropriate synchronization.
std::unique_ptr<Task> Manager::GetTask() {
std::lock_guard guard(mutex);
std::unique_ptr<Task> t;
if (next_row < HEIGHT) {
t = std::make_unique<Task>(next_row);
++next_row;
}
return t;
}
In this example, the manager creates a new task to ray trace the next row. (You could use 16x16 blocks instead of rows if you like.) When all the tasks have been issued, it just returns an empty pointer, which essentially tells the calling thread that there's nothing left to do, and the calling thread will then exit.
If you made all the Tasks in advance and had the manager dole them as they are requested, this would be a typical "work queue" solution. (General work queues also allow new Tasks to be added on the fly, but you don't need that feature for this particular problem.)
I do this a bit differently:
obtain number of CPU and or cores
You did not specify OS so you need to use your OS api for this. search for System affinity mask.
divide screen into threads
I am dividing screen by lines instead of 16x16 blocks so I do not need to have a que or something. Simply create thread for each CPU/core that will process only its horizontal lines rays. That is simple so each thread should have its ID number counting from zero and number of CPU/cores n so lines belonging to each process are:
y = ID + i*n
where i={0,1,2,3,... } once y is bigger or equal then screen resolution stop. This type of access has its advantages for example accessing screen buffer via ScanLines will not be conflicting between threads as each thread access only its lines...
I am also setting affinity mask for each thread so it uses its own CPU/core only it give me a small boost so there is not so much process switching (but that was on older OS versions hard to say what it does now).
synchronize threads
basically you should wait until all threads are finished. if they are then render the result on screen. Your threads can either stop and you will create new ones on next frame or jump to Sleep loops until rendering forced again...
I am using the latter approach so I do not need to create and configure the threads over and over again but beware Sleep(1) can sleep a lot more then just 1 ms.

std::map insert thread safe in c++11?

I have very simple code in which multiple threads are trying to insert data in std::map and as per my understanding this should led to program crash because this is data race
std::map<long long,long long> k1map;
void Ktask()
{
for(int i=0;i<1000;i++)
{
long long random_variable = (std::rand())%1000;
std::cout << "Thread ID -> " << std::this_thread::get_id() << " with looping index " << i << std::endl;
k1map.insert(std::make_pair(random_variable, random_variable));
}
}
int main()
{
std::srand((int)std::time(0)); // use current time as seed for random generator
for (int i = 0; i < 1000; ++i)
{
std::thread t(Ktask);
std::cout << "Thread created " << t.get_id() << std::endl;
t.detach();
}
return 0;
}
However i ran it multiple time and there is no application crash and if run same code with pthread and c++03 application is crashing so I am wondering is there some change in c++11 that make map insert thread safe ?
No, std::map::insert is not thread-safe.
There are many reasons why your example may not crash. Your threads may be running in a serial fashion due to the system scheduler, or because they finish very quickly (1000 iterations isn't that much). Your map will fill up quickly (only having 1000 nodes) and therefore later insertions won't actually modify the structure and reduce possibility of crashes. Or perhaps the implementation you're using IS thread-safe.
For most standard library types, the only thread safety guarantee you get is that it is safe to use separate object instances in separate threads. That's it.
And std::map is not one of the exceptions to that rule. An implementation might offer you more of a guarantee, or you could just be getting lucky.
And when it comes to fixing threading bugs, there's only one kind of luck.

How to parallelize "while" loop by the using of PPL

I need to parallelize "while" loop by the means of PPL. I have the following code in Visual C++ in MS VS 2013.
int WordCount::CountWordsInTextFiles(basic_string<char> p_FolderPath, vector<basic_string<char>>& p_TextFilesNames)
{
// Word counter in all files.
atomic<unsigned> wordsInFilesTotally = 0;
// Critical section.
critical_section cs;
// Set specified folder as current folder.
::SetCurrentDirectory(p_FolderPath.c_str());
// Concurrent iteration through p_TextFilesNames vector.
parallel_for(size_t(0), p_TextFilesNames.size(), [&](size_t i)
{
// Create a stream to read from file.
ifstream fileStream(p_TextFilesNames[i]);
// Check if the file is opened
if (fileStream.is_open())
{
// Word counter in a particular file.
unsigned wordsInFile = 0;
// Read from file.
while (fileStream.good())
{
string word;
fileStream >> word;
// Count total number of words in all files.
wordsInFilesTotally++;
// Count total number of words in a particular file.
wordsInFile++;
}
// Verify the values.
cs.lock();
cout << endl << "In file " << p_TextFilesNames[i] << " there are " << wordsInFile << " words" << endl;
cs.unlock();
}
});
// Destroy critical section.
cs.~critical_section();
// Return total number of words in all files in the folder.
return wordsInFilesTotally;
}
This code does parallel iteration through std::vector in outer loop. Parallelism is provided by concurrency::parallel_for() algorithm. But this code also has nested "while" loop that executes reading from file. I need to parallelize this nested "while" loop. How can this nested "while" loop can be parallelized by the means of PPL. Please help.
As user High Performance Mark hints in his comment, parallel reads from the same ifstream instance will cause undefined and incorrect behavior. (For some more discussion, see question "Is std::ifstream thread-safe & lock-free?".) You're basically at the parallelization limit here with this particular algorithm.
As a side note, even reading multiple different file streams in parallel will not really speed things up if they are all being read from the same physical volume. The disk hardware can only actually support so many parallel requests (typically not more than one at a time, queuing up any requests that come in while it is busy). For some more background, you might want to check out Mark Friedman's Top Six FAQs on Windows 2000 Disk Performance; the performance counters are Windows-specific, but most of the information is of general use.

Resources