How to create a big list of different UUIDs, efficiently? - node.js

I want to generate tickets for an event. I need to generate a lot of them and I decided to have the ticket number as UUIDs. The question is how to generate a big list of UUIDs and it to be different.
I know the easy way to just check every new UUID generated in the already existing list, but this is not very performance friendly. :)
I am using NodeJS with UUID v4.
Thank you!

You could use home-grown UUID function, which is guaranteed to be unique pseudo-random integer in the range [0...2128). Below is one based on Linear Contguential Generator. Constants are taken from here or here. You only need to keep previous number/UUID at hands to generate next one, no need to check because it will be repeated only after full period of 2128.
Code relies on BigInt, tested with node v12
const a = 199967246047888932297834045878657099405n; // should satisfy a % 8n = 5n
const c = 1n; // should be odd
const m = (1n << 128n);
const mask = m - 1n;
function LCG128(state) {
return (BigInt(state) * a + c) & mask; // same as % m
}
q = 7654321n; // seed
q = LCG128(q);
q.toString(16); // first UUID
q = LCG128(q);
q.toString(16); // second UUID
q = LCG128(q);
q.toString(16); // third UUID
UPDATE
Just to be a more philosophical on the issue at hands:
You could consider UUID4 to be black box and trust it - this is what #ChrisWhite proposed
You could consider UUID4 to be black box and distrust it - that is whatyou proposed to check in the list or answer by #KevinPastor
Make your own transparent box which produces numbers in the proper range and be unique - that is my proposal
Beauty of LCG approach is that, given good multiplier and carry, it uniquely and reversable maps range [0...2128) into itself (it could do that for 64bit numbers, with different a, c, or 32bit numbers and so on and so forth). You could even use counter as input starting with 0 up to 2128-1 and it will produce non-repeatable numbers in the same range filling whole [0...2128). So you know that if you either chain it with previous uuid, or use counter, there is 0 chances of collision.

You can create an empty object and every time you generate a UUID, you add an attribute to that object where the key is the generated UUID. When you will generate another UUID, you just have to check if the object attribute is undefined or not to see if it's already used.
const uuids = [];
let uuidUsed = {};
const size = 10;
for (let i = 0; i < size; i++) {
let uuid = uuidv4();
while (uuidUsed[uuid] !== undefined) {
uuid = uuidv4();
}
uuidUsed[uuid] = true;
}

here’s a list of 446,538 IPs formatted in the following way: | id | date | name | uuid | ip |
https://minerlock.com/lock/F50f4b8d8e27e

Related

Pick unique random numbers in node/express js

Have to pick unique random numbers from a given array of numbers. We have a list of numbers from that system has to pick unique random numbers.
Should have an option to input count, like if we give 3 system should pick 3 numbers from the given list.
Is there any available library?
const crypto = require('crypto');
const yourNumbers = [1,2,5,3,4,5,67,34,5345,6623,4234];
let selectedNumbers = [];
let i = 0;
const numbersRequired = 4;
while (i < numbersRequired){
const pos = crypto.randomBytes(4).readUInt32BE()%yourNumbers.length;
if (selectedNumbers.includes(yourNumbers[pos])) {
continue;
}
i++;
selectedNumbers.push(yourNumbers[pos]);
}
console.log(selectedNumbers)
I won't describe the code as it's self explanatory. Also handle the scenario where you cannot pick the required amount of numbers when the yourNumbers is too short. Also change the byte size if your number list is huge
const count = 8
pickedNumbers = [];
randomNumberList = [5,7,22,45,77,33,7,33,11,22,53,46,86,57,88];
for (i =0; i< count ; i++){
const randArrayIndex = Math.floor(Math.random() * randomNumberList.length);
pickedNumbers.push(randomNumberList[randArrayIndex]);
}
console.log(pickedNumbers);
It's a simple calculation. no need for an external library. You can use the random function of the Math object to generate a random number in js. It will return a number between 0 and 1. multiply it with the length of your array list. then it will give you a random index value from the range 0 - your array length. you can use push method of array object to push the numbers you are generating into an empty array.
control how many times you want this to happen with a loop setting the count value by users' input.

Javascript ES6 - Node Js - Associative Array - Objects use - wrong default order

I have a problem trying to use associative arrays/objects in node, I have the following code and I want to get the same order I use when i inserted elements.
var aa = []
aa[0] = 1
aa['second'] = 'pep'
aa['third'] = 'rob'
aa[4] = 2
for (var pos in aa) console.log (aa[pos])
But internally node put/sort first the numbers.
Look real execution :
1
2
pep
rob
I have created a parallel dynamic structure to solve this problem, but I'd like to ask for another solution or idea to get this target.
Regards.
Ricardo.
First of all, I'd recommend to use dictionary but not array, for dynamic keys:
var aa = {};
Elements are listed as its default order. You could check its default order with:
var keys = Object.keys(aa);
for (var i = 0; i < keys.length; i++) {
console.log(keys[i]);
}
If the default order is needed as the same order when inserting elements, try this to save the inserting order in another array:
var aa = {};
var keys = [];
aa[0] = 1
keys.push(0);
aa['second'] = 'pep'
keys.push('second');
aa['third'] = 'rob'
keys.push('third');
aa[4] = 2
keys.push(4);
for (var i = 0; i < keys.length; i++) {
console.log(aa[keys[i]]);
}
You may also want to give some ES6 features a try. Given you want to store data in a hash-like data structure that preserves the order I would recommend to give Map a try:
var map = new Map();
map.set(0, 1);
map.set('second', 'pep');
map.set('third', 'rob');
map.set(4, 2);
for (var [key, value] of map) {
console.log(key, value);
}
map.forEach(function (value, key) {
console.log(key, value);
});
nodejs arrays are just objects with numbers as keys and some added functions to their prototype. Also assoc. arrays are usually two arrays with values hinged together via a common index. for example:
let a = [1,2,3];
let b = ["one","two","three"];
also try to get away from looking through arrays with for loops. use the plethora of functions available to you via the prototype. for loops also iterate through enumerable properties on the objects prototype, so a for loop would incorrectly also iterate through that.

Traverse a graph in parallel

I'm revising for an exam (still) and have come across a question (posted below) that has me stumped. I think, in summary, the question is asking "Think of any_old_process that has to traverse a graph and do some work on the objects it finds, including adding more work.". My question is, what data structure can be parallelised to achieve the goals set out in the question?
The role of a garbage collector (GC) is to reclaim unused memory.
Tracing collectors must identify all live objects by traversing graphs
of objects induced by aggregation relationships. In brief, the GC has
some work-list of tasks to perform. It repeatedly (a) acquires a task
(e.g. an object to inspect), (b) performs the task (e.g. marks the
object unless it is already marked), and (c) generates further tasks
(e.g. adds the children of an unmarked task to the work-list). It is
desirable to parallelise this operation.
In a single-threaded
environment, the work-list is usually a single LIFO stack. What would
you have to do to make this safe for a parallel GC? Would this be a
sensible design for a parallel GC? Discuss designs of data structure
to support a parallel GC that would scale better. Explain why you
would expect them to scale better.
The natural data structure for a graph is, well, a graph, i.e. a set of graph elements (nodes) which can refer other elements. Though, for the better cache reuse, the elements can be placed/allocated in an array or arrays (generally, vectors) in order to put neighbor elements as close in memory as possible. Generally, each element or a group of elements should have a mutex (spin_mutex) to protect access to it, the contention means that some other thread is busy working on it, so no need to wait. Though, if possible, an atomic operation over the flag/state fields is preferable to mark the element as visited without a lock. For example, the simplest data structure can be the following:
struct object {
vector<object*> references;
atomic<bool> is_visited; // for simplicity, or epoch counter
// if nothing resets it to false
void inspect(); // processing method
};
vector<object> objects; // also for simplicity, if it can be for real
// things like `parallel_for` would be perfect here
Given this data structure and the way how GC work is described, it perfectly fits for a recursive parallelism like divide-and-conquer pattern:
void object::inspect() {
if( ! is_visited.exchange(true) ) {
for( object* o : objects ) // alternatively it can be `parallel_for` in some variants
cilk_spawn o->inspect(); // for Cilk or `task_group::run` for TBB or PPL
// further processing of the object
}
}
If the data structure in the question is how the tasks are organized. I'd recommend a work-stealing scheduler (like tbb or cilk. There are tons of papers on this subject. To put it simple, each worker thread has its own but shared deque of tasks, and when the deque is empty, a thread steals tasks from others deques.
The scalability comes from the property that each task can add some other tasks which can work in prarallel..
Your questions:
Think of any_old_process that has to traverse a graph and do some work on the objects it finds, including adding more work.
... what data structure can be parallelised to achieve the goals set out in the question?
Quoted questions:
Some stuff about garbage collection.
Since you are specifically interested in parallelizing graph algorithms, I'll give an example of one kind of graph traversal that can be parallelized well.
Executive Summary
Finding local minima ("basins") or maxima ("peaks") are useful operations in digital image processing. A concrete example is geological watershed analysis. One approach to the problem treats each pixel or small group of pixels in the image as a node and finds non-overlapping minimum spanning trees (MST) with the local minima as the tree roots.
Gory details
Below is a simplistic example. It's a web interview question from Palantir Technologies brought to Programming Puzzles & Code Golf by AnkitSablok. It's simplified by two assumptions (bolded below):
That a pixel/cell only has 4 neighbors instead of the usual eight.
That a cell has all uphill neighbors (it's the local minima) or has a unique downhill neighbor. I.e., plains aren't allowed.
Below that is some JavaScript that solves this problem. It violates every reasonable coding standard against use of side-effects, but illustrates where some of the opportunities for parallelization exist.
In the "Create list of sinks (i.e. roots)" loop, note that each cell can be evaluated completely independently for elevation with respect to it's neighbors as long as the elevation data is static. In a sequential program, one thread of execution examines each cell. In a parallel program, the cells are divvied up so that one, and only one, thread reads and writes the local minima state information (sink[] in the program below). If generating the list of minima/roots in parallel, the queuing operations for the stack would have to be synchronized. For a discussion how to do that for stacks and other queues, see "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms", Michael & Scott, 1996. For modern updates, follow the citation tree on Google Scholar (no mutex required :).
In the "Each root explores it's basin" loop, note that each basin could explored/enumerated/flooded in parallel.
If you want dive deeper into parallelizing MSTs, see "Scalable Parallel Minimum Spanning Forest Computation", Nobari, Cao, arras, Bressan, 2012. The first two pages contain a clear and concise survey of the field.
Simplified example
A group of farmers has some elevation data, and we’re going to help them understand how rainfall flows over their farmland. We’ll represent the land as a two-dimensional array of altitudes and use the following model, based on the idea that water flows downhill:
If a cell’s four neighboring cells all have higher altitudes, we call this cell a sink; water collects in sinks. Otherwise, water will flow to the neighboring cell with the lowest altitude. If a cell is not a sink, you may assume it has a unique lowest neighbor and that this neighbor will be lower than the cell.
Cells that drain into the same sink – directly or indirectly – are said to be part of the same basin.
Your challenge is to partition the map into basins. In particular, given a map of elevations, your code should partition the map into basins and output the sizes of the basins, in descending order.
Assume the elevation maps are square. Input will begin with a line with one integer, S, the height (and width) of the map. The next S lines will each contain a row of the map, each with S integers – the elevations of the S cells in the row. Some farmers have small land plots such as the examples below, while some have larger plots. However, in no case will a farmer have a plot of land larger than S = 5000.
Your code should output a space-separated list of the basin sizes, in descending order. (Trailing spaces are ignored.)
Here's an example:
Input:
5
1 0 2 5 8
2 3 4 7 9
3 5 7 8 9
1 2 5 4 2
3 3 5 2 1
Output: 11 7 7
The basins, labeled with A’s, B’s, and C’s, are:
A A A A A
A A A A A
B B A C C
B B B C C
B B C C C
// lm.js - find the local minima
// Globalization of variables.
/*
The map is a 2 dimensional array. Indices for the elements map as:
[0,0] ... [0,n]
...
[n,0] ... [n,n]
Each element of the array is a structure. The structure for each element is:
Item Purpose Range Comment
---- ------- ----- -------
h Height of cell integers
s Is it a sink? boolean
x X of downhill cell (0..maxIndex) if s is true, x&y point to self
y Y of downhill cell (0..maxIndex)
b Basin name ('A'..'A'+# of basins)
Use a separate array-of-arrays for each structure item. The index range is
0..maxIndex.
*/
var height = [];
var sink = [];
var downhillX = [];
var downhillY = [];
var basin = [];
var maxIndex;
// A list of sinks in the map. Each element is an array of [ x, y ], where
// both x & y are in the range 0..maxIndex.
var basinList = [];
// An unordered list of basin sizes.
var basinSize = [];
// Functions.
function isSink(x,y) {
var myHeight = height[x][y];
var imaSink = true;
var bestDownhillHeight = myHeight;
var bestDownhillX = x;
var bestDownhillY = y;
/*
Visit the neighbors. If this cell is the lowest, then it's the
sink. If not, find the steepest downhill direction.
*/
function visit(deltaX,deltaY) {
var neighborX = x+deltaX;
var neighborY = y+deltaY;
if (myHeight > height[neighborX][neighborY]) {
imaSink = false;
if (bestDownhillHeight > height[neighborX][neighborY]) {
bestDownhillHeight = height[neighborX][neighborY];
bestDownhillX = neighborX;
bestDownhillY = neighborY;
}
}
}
if (x !== 0) {
// upwards neighbor exists
visit(-1,0);
}
if (x !== maxIndex) {
// downwards neighbor exists
visit(1,0);
}
if (y !== 0) {
// left-hand neighbor exists
visit(0,-1);
}
if (y !== maxIndex) {
// right-hand neighbor exists
visit(0,1);
}
downhillX[x][y] = bestDownhillX;
downhillY[x][y] = bestDownhillY;
return imaSink;
}
function exploreBasin(x,y,currentSize,basinName) {
// This cell is in the basin.
basin[x][y] = basinName;
currentSize++;
/*
Visit all neighbors that have this cell as the best downhill
path and add them to the basin.
*/
function visit(x,deltaX,y,deltaY) {
if ((downhillX[x+deltaX][y+deltaY] === x) && (downhillY[x+deltaX][y+deltaY] === y)) {
currentSize = exploreBasin(x+deltaX,y+deltaY,currentSize,basinName);
}
return 0;
}
if (x !== 0) {
// upwards neighbor exists
visit(x,-1,y,0);
}
if (x !== maxIndex) {
// downwards neighbor exists
visit(x,1,y,0);
}
if (y !== 0) {
// left-hand neighbor exists
visit(x,0,y,-1);
}
if (y !== maxIndex) {
// right-hand neighbor exists
visit(x,0,y,1);
}
return currentSize;
}
// Read map from file (1st argument).
var lines = $EXEC('cat "' + $ARG[0] + '"').split('\n');
maxIndex = lines.shift() - 1;
for (var i = 0; i<=maxIndex; i++) {
height[i] = lines.shift().split(' ');
// Create all other 2D arrays.
sink[i] = [];
downhillX[i] = [];
downhillY[i] = [];
basin[i] = [];
}
for (var i = 0; i<=maxIndex; i++) { print(height[i]); }
// Everyone decides if they are a sink. Create list of sinks (i.e. roots).
for (var x=0; x<=maxIndex; x++) {
for (var y=0; y<=maxIndex; y++) a
if (sink[x][y] = isSink(x,y)) {
// This node is a root (AKA sink).
basinList.push([x,y]);
}
}
}
//for (var i = 0; i<=maxIndex; i++) { print(sink[i]); }
// Each root explores it's basin.
var basinName = 'A';
for (var i=basinList.length-1; i>=0; --i) { // i-- makes Closure Compiler sad
var x = basinList[i][0];
var y = basinList[i][5];
basinSize.push(exploreBasin(x,y,0,basinName));
basinName = String.fromCharCode(basinName.charCodeAt() + 1);
}
for (var i = 0; i<=maxIndex; i++) { print(basin[i]); }
// Done.
print(basinSize.sort(function(a, b){return b-a}).join(' '));

NDepend rule to warn if objects of a given type are compared using ==

as the title says: I need a NDepend rule (CQLinq) for C#/.net code, that fires whenever instances of a given type are compared using == (reference comparison). In other words, I want to force the programmer to use .Equals.
Note that the type in question has no overloaded equality operator.
Is this possible? If so, how? :)
Thanks, cheers,
Tim
With the following code with see that for value type, == translate to the IL instruction: ceq. This kind of usage cannot be detected with NDepend.
int i = 2;
int j = 3;
Debug.Assert(i == j);
var s1 = "2";
var s2 = "3";
Debug.Assert(s1 == s2);
However for reference types we can see that a operator method named op_Equality is called.
L_001d: call bool [mscorlib]System.String::op_Equality(string, string)
Hence we just need a CQLinq query that first match all method named op_Equality, and then list all callers of these methods. This can look like:
let equalityOps = Methods.WithSimpleName("op_Equality")
from m in Application.Methods.UsingAny(equalityOps)
select new { m,
typesWhereEqualityOpCalled = m.MethodsCalled.Intersect(equalityOps).Select(m1 => m1.ParentType) }
This seems to work pretty well :)

Generating a fake ISBN from book title? (Or: How to hash a string into a 6-digit numeric ID)

Short version: How can I turn an arbitrary string into a 6-digit number with minimal collisions?
Long version:
I'm working with a small library that has a bunch of books with no ISBNs. These are usually older, out-of-print titles from tiny publishers that never got an ISBN to begin with, and I'd like to generate fake ISBNs for them to help with barcode scanning and loans.
Technically, real ISBNs are controlled by commercial entities, but it is possible to use the format to assign numbers that belong to no real publisher (and so shouldn't cause any collisions).
The format is such that:
978-0-01-######-?
Gives you 6 digits to work with, from 000000 to 999999, with the ? at the end being a checksum.
Would it be possible to turn an arbitrary book title into a 6-digit number in this scheme with minimal chance of collisions?
After using code snippets for making a fixed-length hash and calculating the ISBN-13 checksum, I managed to create really ugly C# code that seems to work. It'll take an arbitrary string and convert it into a valid (but fake) ISBN-13:
public int GetStableHash(string s)
{
uint hash = 0;
// if you care this can be done much faster with unsafe
// using fixed char* reinterpreted as a byte*
foreach (byte b in System.Text.Encoding.Unicode.GetBytes(s))
{
hash += b;
hash += (hash << 10);
hash ^= (hash >> 6);
}
// final avalanche
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
// helpfully we only want positive integer < MUST_BE_LESS_THAN
// so simple truncate cast is ok if not perfect
return (int)(hash % MUST_BE_LESS_THAN);
}
public int CalculateChecksumDigit(ulong n)
{
string sTemp = n.ToString();
int iSum = 0;
int iDigit = 0;
// Calculate the checksum digit here.
for (int i = sTemp.Length; i >= 1; i--)
{
iDigit = Convert.ToInt32(sTemp.Substring(i - 1, 1));
// This appears to be backwards but the
// EAN-13 checksum must be calculated
// this way to be compatible with UPC-A.
if (i % 2 == 0)
{ // odd
iSum += iDigit * 3;
}
else
{ // even
iSum += iDigit * 1;
}
}
return (10 - (iSum % 10)) % 10;
}
private void generateISBN()
{
string titlehash = GetStableHash(BookTitle.Text).ToString("D6");
string fakeisbn = "978001" + titlehash;
string check = CalculateChecksumDigit(Convert.ToUInt64(fakeisbn)).ToString();
SixDigitID.Text = fakeisbn + check;
}
The 6 digits allow for about 10M possible values, which should be enough for most internal uses.
I would have used a sequence instead in this case, because a 6 digit checksum has relatively high chances of collisions.
So you can insert all strings to a hash, and use the index numbers as the ISBN, either after sorting or without it.
This should make collisions almost impossible, but it requires keeping a number of "allocated" ISBNs to avoid collisions in the future, and keeping the list of titles that are already in store, but it's information that you would most probably want to keep anyway.
Another option is to break the ISBN standard and use hexadecimal/uuencoded barcodes, that may increase the possible range to a point where it may work with a cryptographic hash truncated to fit.
I would suggest that since you are handling old book titles, which may have several editions capitalized and punctuated differently, I would strip punctuation, duplicated whitespaces and convert everything to lowercase before the comparison to minimize the chance of a technical duplicate even though the string is different (Unless you want different editions to have different ISBNs, in that case, you can ignore this paragraph).

Resources