Implementation of realtime break iterator - icu

I'm interested in modifying the break iterator data (zh) as my program is running as the user adds new words. This means that the data cannot be originally packaged and must be generated as I go. Can I use something like udata_setAppData or udata_setCommonData to achieve the result? I expect the .dat for the break iterator to change 2-3 times a day - so loading time should not be the critical issue.
Here's the psuedo code:
1. Start program
2. Generate .dat-like data from database for break iterators
3. Load into icu as zh break iterator
If the user makes a change to the database
4. Drop current .dat for zh break iterator
5. Regenerate .dat-like data
6. Reload
Is this possible. I think it is almost possible if I have a way of replacing U_ICUDAT_BRKITR on the fly.
Update. It seems that to pull this off, I must use code from gencmn to generate the new .dat file.

There is no API to customize the dictionary.

Related

AnyLogic: Look ahead simulation

Is it possible to perform look ahead simulation in AnyLogic?
Specifically:
Simulate till time T.
Using 2 values of a variable, simulate for both values till T+t in parallel.
Evaluate the system state at T+t, choose the value of variable which leads to better performance.
Continue simulating from T using the selected value for the variable.
This is the basic functionality I am trying to implement. The variable values can be taken from decision tree, which should not affect the implementation.
Please let me know if someone has done something like this.
Yes, it is possible with some Java code. You may:
Pause parent experiment, save snapshot at time T;
Create two new experiments from parent experiment;
Load snapshots in two new experiments;
Continue execution of both experiments till time T + t;
Send notification to parent experiment, compare the results, assign the best value and continue simulation.
Some points can be done manually with UI controls or by code, some — by code only.

Async usage for Loading/Saving a list of objects to files

Let's say I have a list of same-class objects packed into a single file which I save to/load from at application startup.
What I'd like to do is use the power of async processing to speed up load-all time & save-all time - let's also assume that the files themselves are efficiently packed (using Protocol Buffers or the like).
What would be the best way to go about this? Would async processing actually help in this scenario?
One method I thought of is to "pre-determine" the amount of chunking by picking a number greater than 1, dividing the list up by that number, then saving/loading using that number as the number of tasks. However, this seems somewhat arbitrary, & I was curious if there are some libraries out there that might just make the decision for me based on some conditions.
I.e. I might call my "chunkable list" something like:
Chunkable<List<SomeObject>>
.. and then the program would just divide up the list correctly to read/save in an efficient way - e.g. save 10 files like "List_01", "List_XX" - then read from the chunks when performing a load-all.
The final ordering of the list, when saving or loading, is not important - just having the objects available as a single list.
For posterity, one conceptual answer here is to use a Partitioner in the Task Parallel Library.
For saving, I can have the Partitioner serialize pieces of the list & write out files as tasks complete in a given non-repeating format.
For loading, I can get a count/list of the existing chunks in a given location on disk, then have the TPL load up & deserialize all the chunks & recombine them in whatever order they complete (using some Interlocked var to make sure each file is only read once).
I will paste code in here once I have tested some.

What is better generate random IDs at runtime or keep them handy before?

I am writing an app and need to do something functionally similar to what url shortening websites do. I will be generating 6 character (case insensitive alphanumeric) random strings which would identify their longer versions of the link. This leads to 2176782336 possibilities ((10+26)^6). While assigning these strings, there are two approaches I can think about.
Approach 1: the system generates a random string at the runtime and checks for it uniqueness in the system, if it is not unique it tries again. and finally reaches a unique string somehow. But it might create issues if the user is "unlucky" maybe.
Approach 2: I generate a pool of some possible values and assign them as soon as they are needed, this however would make sure the user is always allocated a unique string almost instantly, while this could at the same time also mean, I would have to do plenty of computation in crons beforehand and will increase over the period of time.
While I already have the code to generate such values, a help on approach might be insightful as I am looking forward to a highly accelerated app experience. I could not find any comparative study on this.
Cheers!
What I do in similar situations is to keep N values queued up so that I can instantly assign them, and then when the queue's size falls below a certain threshold (say, .2 * N) I have a background task add another N items to the queue. It probably makes sense to start this background task as soon as your program starts (as opposed to generating the first N values offline and then loading them at startup), operating on the assumption that there will be some delay between startup and requests for values from the queue.

thread safe search-and-add

I need to be able to do the following:
search a linked list.
add a new node to the list in case it's not found.
be thread safe and use rwlock since it's read mostly list.
The issue i'm having is when I promote from read_lock to write_lock I need to search the list again just to make sure some other thread wasn't waiting on a write_lock while I was doing the list search holding the read_lock.
Is there a different way to achieve the above without doing a double list search (perhaps a seq_lock of some sort)?
Convert the linked list to a sorted linked list. When its time to add a new node you can check again to see if another writer has added an equivalent node while you were acquiring the lock by inspecting only two nodes, instead of searching the entire list. You will spend a little more time on each node insertion because you need to determine the sorted order of the new node, but you will save time by not having to search the entire list. Overall you will probably save a lot of time.

Is reading data in one thread while it is written to in another dangerous for the OS?

There is nothing in the way the program uses this data which will cause the program to crash if it reads the old value rather than the new value. It will get the new value at some point.
However, I am wondering if reading and writing at the same time from multiple threads can cause problems for the OS?
I am yet to see them if it does. The program is developed in Linux using pthreads.
I am not interested in being told how to use mutexs/semaphores/locks/etc edit: so my program is only getting the new values, that is not what I'm asking.
No.. the OS should not have any problem. The tipical problem is the that you dont want to read the old values or a value that is half way updated, and thus not valid (and may crash your app or if the next value depends on the former, then you can get a corrupted value and keep generating wrong values all the itme), but if you dont care about that, the OS wont either.
Are the kernel/drivers reading that data for any reason (eg. it contains structures passed in to kernel APIs)? If no, then there isn't any issue with it, since the OS will never ever look at your hot memory.
Your own reads must ensure they are consistent so you don't read half of a value pre-update and half post-update and end up with a value that is neither pre neither post update.
There is no danger for the OS. Only your program's data integrity is at risk.
Imagine you data to consist of a set (structure) of values, which cannot be updated in an atomic operation. The reading thread is bound to read inconsistent data at some point (data consisting of a mixture of old and new values). But you did not want to hear about mutexes...
Problems arise when multiple threads share access to data when accessing that data is not atomic. For example, imagine a struct with 10 interdependent fields. If one thread is writing and one is reading, the reading thread is likely to see a struct that is halfway between one state and another (for example, half of it's members have been set).
If on the other hand the data can be read and written to with a single atomic operation, you will be fine. For example, imagine if there is a global variable that contains a count... One thread is incrementing it on some condition, and another is reading it and taking some action... In this case, there is really no intermediate inconsistent state. It's either got the new value, or it has the old value.
Logically, you can think of locking as a tool that lets you make arbitrary blocks of code atomic, at least as far as the other threads of execution are concerned.

Resources