Big size array in Vivado_HLS? - vivado

Is it possible to have an big size array like arr[200000] as output in top function of Vivado_HLS.

Yes BUT:
which kind of "type" are the elements of the array? int? char? a single bit?
which kind of interface do you want to use? if you want pass all the elements at the same time, the operation may be impossible because maybe you don't have enough space on the fpga. If you are using a streaming or serial interface you can do this.
Normally you don't have this kind of limitation but you should evaluate, case by case, what is the best solution with the hardware that you have.

The long-sized arrays are generally implemented on BRAM because the LUTS are used for small-sized arrays. So you have to consider whether you have enough resources to utilize.
If your application allows using FIFO or axi-stream or axi-full with burst then you may think to use them to transfer the data without holding the whole array on the PL, maybe you can use a buffer that holds a small chunk of your array.
So it depends both on your board and your algorithm.

Related

Turning unique filepath into unique integer

I often times use filepaths to provide some sort of unique id for some software system. Is there any way to take a filepath and turn it into a unique integer in relatively quick (computationally) way?
I am ok with larger integers. This would have to be a pretty nifty algorithm as far as I can tell, but would be very useful in some cases.
Anybody know if such a thing exists?
You could try the inode number:
fs.statSync(filename).ino
#djones's suggestion of the inode number is good if the program is only running on one machine and you don't care about a new file duplicating the id of an old, deleted one. Inode numbers are re-used.
Another simple approach is hashing the path to a big integer space. E.g. using a 128 bit murmurhash (in Java I'd use the Guava Hashing class; there are several js ports), the chance of a collision among a billion paths is still 1/2^96. If you're really paranoid, keep a set of the hash values you've already used and rehash on collision.
This is just my comment turned to an answer.
If you run it in the memory, you can use one of standard hashmaps in your corresponding language. Not just for file names, but for any similar situation. Normally, hashmaps in different programming languages are satisfying collisions by buckets, so the hash number and the corresponding bucket number will provide a unique id.
Btw, it is not hard to write your own hashmap, such that you have control on the underlying structure (e.g. to retrieve the number etc).

When designing a hash_table, how many aspects should be paid attention to?

I have some candidate aspects:
The hash function is important, the hashcode should be unique as far as possible.
The backend data structure is important, the search, insert and delete operations should all have time complexity O(1).
The memory management is important, the memory overhead of every hash_table entry should be as least as possible. When the hash_table is expanding, the memory should increase efficiently, and when the hash_table is shrinking, the memory should do garbage collection efficiently. And with these memory operations, the aspect 2 should also be full filled.
If the hash_table will be used in multi_threads, it should be thread safe and also be efficient.
My questions are:
Are there any more aspects worth attention?
How to design the hash_table to full fill these aspects?
Are there any resources I can refer to?
Many thanks!
After reading some material, update my questions. :)
In a book explaining the source code of SGI STL, I found some useful informations:
The backend data structure is a bucket of linked list. When search, insert or delete an element in the hash_table:
Use a hash function to calculate the corresponding position in the bucket, and the elements are stored in the linked list after this position.
When the size of elements is larger than the size of buckets, the buckets need resize: expand the size to be 2 times larger than the old size. The size of buckets should be prime. Then copy the old buckets and elements to the new one.
I didn't find the logic of garbage collection when the number of elements is much smaller than the number of buckets. But I think this logic should be considerated when many inserts at first then many deletes later.
Other data structures such as arrays with linear detection or square detection is not as good as linked list.
A good hash function can avoid clusters, and double hash can help to resolve clusters.
The question about multi_threads is still open. :D
There are two (slightly) orthogonal concern.
While the hash function is obviously important, in general you separate the design of the backend from the design of the hash function:
the hash function depends on the data to be stored
the backend depends on the requirements of the storage
For hash functions, I would suggest reading about CityHash or MurmurHash (with an explanation on SO).
For the back-end, there are various concerns, as you noted. Some remarks:
Are we talking average or worst case complexity ? Without perfect hashing, achieving O(1) is nigh-impossible as far as I know, though the worst case frequency and complexity can be considerably dampened.
Are we talking amortized complexity ? Amortized complexity in general offer better throughput at the cost of "spikes". Linear rehashing, at the cost of a slightly lower throughput, will give you a smoother curve.
With regard to multi-threading, note that the Read/Write pattern may impact the solution, considering extreme cases, 1 producer and 99 readers is very different from 99 producers and 1 reader. In general writes are harder to parallelize, because they may require modifying the structure. At worst, they might require serialization.
Garbage Collection is pretty trivial in the amortized case, with linear-rehashing it's a bit more complicated, but probably the least challenging portion.
You never talked about the amount of data you're about to use. Writers can update different buckets without interfering with one another, so if you have a lot of data, you can try to spread them around to avoid contention.
References:
The article on Wikipedia exposes lots of various implementations, always good to peek at the variety
This GoogleTalk from Dr Cliff (Azul Systems) shows a hash table designed for heavily multi-threaded systems, in Java.
I suggest you read http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
The link points to a blog by Cliff Click which has an entry on hash functions. Some of his conclusions are:
To go from hash to index, use binary AND instead of modulo a prime. This is many times faster. Your table size must be a power of two.
For hash collisions don't use a linked list, store the values in the table to improve cache performance.
By using a state machine you can get a very fast multi-thread implementation. In his blog entry he lists the states in the state machine, but due to license problems he does not provide source code.

Performance of Long IDs

I've been wondering about this for some time. In CouchDB we have some fairly log IDs...eg:
"000ab56cb24aef9b817ac98d55695c6a"
Now if we're searching for this item and going through the tree structure created by the view. It seems a simple integer as an id would be much faster. If we used 64bit integers it would be a simple CMP followed by a JMP (assuming that the Erlang code was using JIT, but you get my point).
For strings, I assume we generate a hash off the ID or something, but at some point we have to do a character compare on all 33 characters...won't that affect performance?
The short answer is, yes, of course it will affect performance, because the key length will directly impact the time it takes to walk down the tree.
It also affects storage, as longer keys take more space, space takes time.
However, the nuance you are missing is that while Couch CAN (and does) allocated new IDs for you, it is not required to. It will be more than happy to accept your own IDs rather than generate it's own. So, if the key length bothers you, you are free to use shorter keys.
However, given the "json" nature of couch, it's pretty much a "text" based database. There's isn't a lot of binary data stored in a normal Couch instance (attachments not withstanding, but even those I think are stored in BASE64, I may be wrong).
So, while, yes an 64-bit would be the most efficient, the simple fact is that Couch is designed to work for any key, and "any key" is most readily expressed in text.
Finally, truth be told, the cost of the key compare is dwarfed by the disk I/O fetch times, and the JSON marshaling of data (especially on writes). Any real gain achieved by converting to such a system would likely have no "real world" impact on overall performance.
If you want to really speed up the Couch key system, code the key routine to block the key in to 64Bit longs, and comapre those (like you said). 8 bytes of text is the same as a 64 bit "long int". That would give you, in theory, an 8x performance boost on key compares. Whether erlang can create such code, I can't say.
From the CouchDB: The definitive guide book:
I need to draw a picture of this at
some point, but the reason is if you
think of the idealized btree, when you
use UUID’s you might be hitting any
number of root nodes in that tree, so
with the append only nature you have
to write each of those nodes and
everything above it in the tree. but
if you use monotonically increasing
id’s then you’re invalidating the same
path down the right hand side of the
tree thus minimizing the number of
nodes that need to be rewritten. would
be just the same for monotonically
decreasing as well. and it should
technically work if you’re updates can
be guaranteed to hit one or two nodes
in the inside of the tree, though
that’s much harder to prove.
So sequential IDs offer a performance benefit, however, you must remember this isn't maintainable when you have more than one database, as the IDs will collide.

How to avoid mutable state (when multithreading)

Multithreading is hard. The only this you can do is program very carefully and follow good advice. One great advice I got from the answers on this forum is to avoid mutable state. I understand this is even enforced in the Erlang language. However, I fail to see how this can be done without a severe performance hit and huge amounts of caches.
For example. You have a big list of objects, each containing quite a lot of properties; in other words: a large datastructure. Suppose you have got a bunch of threads and they all need to access and modify the list. How can this be done without shared memory without having to cache the whole datastructure in each of the threads?
Update: After reading the reactions so far, I would like to put some more emphasis on performance. Don't you think that copying the same data around will make the program slower than with shared memory?
Not each algorithm can be parallelized in a successful manner.
If your program doesn't exhibit any "parallel structure", then you're pretty doomed to use locking and shared, mutable structures.
If your algorithm exhibit structure, then you can express your computation in terms of some patterns or formalism (for ex., a macro dataflow graph) that makes the choice of an immutable datastruct trivial.
So: think in term of the structure of the algorithm and just not in term of the properties of the datastructure to use.
You can get a great start in thinking about immutable collections, where they are applicable, how they can actually work without requiring lots of copying, etc. by looking through Eric Lippert's articles tagged with immutability:
http://blogs.msdn.com/ericlippert/archive/tags/Immutability/default.aspx
I guess the first question is: why do they need to modify the list? Would it be possible for them to return their changes as a list of modifications rather than actually modifying the shared list? Could they work with a list which looks like it's a mutable version of the original list, but is actually only locally mutable? Are you changing which elements are in the list, or just the properties of those elements?
These are just questions rather than answers, but I'm trying to encourage you to think about your problem in a different way. Look at the bigger picture as the task you want to achieve instead of thinking about the way you'd tackle it in a normal imperative, mutable fashion. Changing the way you think about problems is very difficult, but you may find you get some great "aha!" moments :)
There are many pitfalls when working with multiple threads and large sets of data. The advice to avoid mutable state is meant to try to make life easier for you if you can manage to follow the guideline (i.e. if you have no mutable state then multi-threading will be much easier).
If you have a large amount of data that does need to be modified then you perhaps cannot avoid mutable state. An alternative though would be to partition the data into blocks, each of which is passed to a thread for manipulation. The block can be processed and then passed back, and the controller can then perform the updates where necessary. In this scenario you have removed the mutable state from out of the the thread.
If this cannot be done and each thread needs update access to the full list (i.e. it could update any item on the list at any time) then you are going to have a lot of fun trying to make sure you have got your locking strategies and concurrency issues sorted. I'm sure there are scenarios where this is required, and the design pattern of avoiding mutable state may not apply.
Just using immutable data-objects is a big help.
Modifying lists sounds like a constructed argument, but consider granular methods that are unaware of lists.
If you really need to update the structure one way to do this is have a single worker thread which picks up update requests from a fixed area prtected by a mutex.
If you are clever you can update the structure in place without affecting any "reading"
threads (e.g. If you are adding to the end of an array you do all the work to add the new structure but only as the very last instruction do you increment the NoOfMembers count -- the reading threads should not see the new entry until you do this - or - arrange your data as an array of references to structures -- when you want to update a structure you copy the current member, update it, then as the last operation replace the reference in the array)
The other threads then only need to check a single simple "update in progess" mutex only when they activly want to update.

How do I sort Lucene results by field value using a HitCollector?

I'm using the following code to execute a query in Lucene.Net
var collector = new GroupingHitCollector(searcher.GetIndexReader());
searcher.Search(myQuery, collector);
resultsCount = collector.Hits.Count;
How do I sort these search results based on a field?
Update
Thanks for your answer. I had tried using TopFieldDocCollector but I got an error saying, "value is too small or too large" when i passed 5000 as numHits argument value. Please suggest a valid value to pass.
The search.Searcher.search method will accept a search.Sort parameter, which can be constructed as simply as:
new Sort("my_sort_field")
However, there are some limitations on which fields can be sorted on - they need to be indexed but not tokenized, and the values convertible to Strings, Floats or Integers.
Lucene in Action covers all of the details, as well as sorting by multiple fields and so on.
What you're looking for is probably TopFieldDocCollector. Use it instead of the GroupingHitCollector (what is that?), or inside it.
Comment on this if you need more info. I'll be happy to help.
In the original (Java) version of Lucene, there is no hard restriction on the size of the the TopFieldDocCollector results. Any number greater than zero is accepted. Although memory constraints and performance degradation create a practical limit that depends on your environment, 5000 hits is trivial and shouldn't pose a problem outside of a mobile device.
Perhaps in porting Lucene, TopFieldDocCollector was modified to use something other than Lucene's "heap" implementation (called PriorityQueue, extended by FieldSortedHitQueue)—something that imposes an unreasonably small limit on the results size. If so, you might want to look at the source code for TopFieldDocCollector, and implement your own similar hit collector using a better heap implementation.
I have to ask, however, why are you trying to collect 5000 results? No user in an interactive application is going to want to see that many. I figure that users willing to look at 200 results are rare, but double it to 400 just as factor of safety. Depending on the application, limiting the result size can hamper malicious screen scrapers and mitigate denial-of-service attacks too.
The constructor for Sort accepting only the string field name has been depreciated. Now you have to create a sort object and pass it in as the last paramater of searcher.Search()
/* sorting by a field of type long called "size" from greatest -> smallest
(signified by passing in true for the last isReversed paramater)*/
Sort sorter = new Sorter(new SortField("size", SortField.Type.LONG, true))
searcher.Search(myQuery, collector, sorter);

Resources