Short user-friendly ID for mongo

Short user-friendly ID for mongo - node.js

I am creating a real time stock trading system and would like to provider the user with a human readible, user friendly way to refer to their orders. For example the ID should be like 8 characters long and only contain upper case characters e.g. Z9CFL8BA. For obvious reasons the id needs to be unique in the system.
I am using MongoDB as the backend database and have evaluated the following projects which do not meet my requirements.
hashids.org - this looks good but it generates ids which are too long:
var mongoId = '507f191e810c19729de860ea';
var id = hashids.encodeHex(mongoId);
console.log(id)
which results in: 1E6Y3Y4D7RGYHQ7Z3XVM4NNM
github.com/dylang/shortid - this requires that you specify a 64 character alphabet, and as mentioned I only want to use uppercase characters.
I understand that the only way to achieve what I am looking for may well be by generating random codes that meet my requirements and then checking the database for collisions. If this is the case, what would be the most efficient way to do this in a nodejs / mongodb environment?

You're attempting to convert a base-16 (hexadecimal) to base-36 (26 characters in alphabet plus 10 numbers). A simple way might be to simply use parseInt's radix parameter to parse the hexadecimal id, and then call .toString(36) to convert that into base-36. Which would turn "507f191e810c19729de860ea" into "VDFGUZEA49X1V50356", reducing the length from 24 to 18 characters.
function toBase36(id) {
var half = Math.floor(id.length / 2);
var first = id.slice(0, half);
var second = id.slice(half);
return parseInt(first, 16).toString(36).toUpperCase()
+ parseInt(second, 16).toString(36).toUpperCase();
}
function toBase36(id) {
var half = Math.floor(id.length / 2);
var first = id.slice(0, half);
var second = id.slice(half);
return parseInt(first, 16).toString(36).toUpperCase()
+ parseInt(second, 16).toString(36).toUpperCase();
}
// Ignore everything below (for demo only)
function convert(e){ if (e.target.value.length % 2 === 0) base36.value = toBase36(e.target.value) }
var base36 = document.getElementById('base36');
var hex = document.getElementById('hex');
document.getElementById('hex').addEventListener('input', convert, false);
convert({ target: { value: hex.value } });
input { font-family: monospace; width: 15em; }
<input id="hex" value="507f191e810c19729de860ea">
<input id="base36" readonly>

I understand that the only way to achieve what I am looking for may well be by generating random codes that meet my requirements and then checking the database for collisions. If this is the case, what would be the most efficient way to do this in a nodejs / mongodb environment?
Given your description, you use 8 chars in the range [0-9A-Z] as "id". That is 36⁸ combinations (≈ 2.8211099E12). Assuming your trading system does not gain insanely huge popularity in the short to mid term, the chances of collision are rather low.
So you can take an optimistic approach, generating a random id with something along the lines of the code below (as noticed by #idbehold in a comment, be warn that Math.random is probably not random enough and so might increase the chances of collision -- if you go that way, maybe you should investigate a better random generator [1])
> rid = Math.floor(Math.random()*Math.pow(36, 8))
> rid.toString(36).toUpperCase()
30W13SW
Then, using a proper unique index on that field, you only have to loop, regenerating a new random ID until there is no collision when trying to insert a new transaction. As the chances of collision are relatively small, this should terminate. And most of the time this will insert the new document on first iteration as there was no collision.
If I'm not too wrong, assuming 10 billion of transactions, you still have only 0.3% chance of collision on first turn, and a little bit more than 0.001% on the second turn
[1] On node, you might prefer using crypto.pseudoRandomBytes to generate your random id. You might build something around that, maybe:
> b = crypto.pseudoRandomBytes(6)
<SlowBuffer d3 9a 19 fe 08 e2>
> rid = b.readUInt32BE(0)*65536 + b.readUInt16BE(4)
232658814503138
> rid.toString(36).substr(0,8).toUpperCase()
'2AGXZF2Z'

Related

How to power a windowed virtual list with cursor based pagination?

Take a windowed virtual list with the capability of loading an arbitrary range of rows at any point in the list, such as in this following example.
The virtual list provides a callback that is called anytime the user scrolls to some rows that have not been fetched from the backend yet, and provides the start and stop indexes, so that, in an offset based pagination endpoint, I can fetch the required items without fetching any unnecessary data.
const loadMoreItems = (startIndex, stopIndex) => {
fetch(`/items?offset=${startIndex}&limit=${stopIndex - startIndex}`);
}
I'd like to replace my offset based pagination with a cursor based one, but I can't figure out how to reproduce the above logic with it.
The main issue is that I feel like I will need to download all the items before startIndex in order to receive the cursor needed to fetch the items between startIndex and stopIndex.
What's the correct way to approach this?

After some investigation I found what seems to be the way MongoDB approaches the problem:
https://docs.mongodb.com/manual/reference/method/cursor.skip/#mongodb-method-cursor.skip
Obviously he same approach can be adopted by any other backend implementation.
They provide a skip method that allows to skip an arbitrary amount of items after the provided cursor.
This means my sample endpoint would look like the following:
/items?cursor=${cursor}&skip=${skip}&limit=${stopIndex - startIndex}
I then need to figure out the cursor and the skip values.
The following code could work to find the closest available cursor, given I store them together with the items:
// Limit our search only to items before startIndex
const fragment = items.slice(0, startIndex);
// Find the closest cursor index
const cursorIndex = fragment.length - 1 - fragment.reverse().findIndex(item => item.cursor != null);
// Get the cursor
const cursor = items[cursorIndex];
And of course, I also have a way to know the skip value:
const skip = items.length - 1 - cursorIndex;

Node.js - Multimap

I have the following data(example) -
1 - "Value1A"
1 - "Value1B"
1 - "Value1C"
2 - "Value2A"
2 - "Value2B"
I'm using Multimaps for the above data, such that the key 1, has 3 values(Value1A, Value1B, Value1C) and key 2 has 2 values(Value2A, Value2B).
When I try to retrieve all the values for a given key using the get function, it works. But I want to get the key given the value. i.e. if I have "Value1C", I want to use this to get its key 1, from the Multimap. Is this possible, if so how and if not what other than Multimap can I use to achieve this result.
Thanks for the help
https://www.npmjs.com/package/multimap

It is not possible to do this with a single operation, You will need to choose beetween use some extra memory or consume CPU resource.
Use more memory
In this case you need to store the data in a reverse mapping. So you will have another map to store as "Value1C" -> 1. This solution can cause consistency issues, since all the operations will need to be updated in both map. The original one and the reverse one.
The example for this code is basic:
//insert
map.set(1, "Value1C");
reverseMap.set("Value1C", 1);
//search
console.log(map.get(reverseMap.get("Value1C")));
Use more CPU
In this cause you will need to do a search throught all the values, this will be an O(n) complexity. It is not good if your list is too big, even worst in a single thread environment like Node.js.
Check the code example below:
function findValueInMultiMap(map, value, callback){
map.forEachEntry(function (entry, key) {
for(var e in entry){
if(entry[e]==value){
callback(map.get(key));
}
}
});
}
findValueInMultiMao(map, 'Value1C', function(values){
console.log(values);
});

Paginating a mongoose mapReduce, for a ranking algorithm

I'm using a MongoDB mapReduce to code a ranking feed algorithm, it almost works but the latest thing to implement is the pagination. The map reduce supports the results limitation but how could I implement the offset (skipping) based e.g. on the latest viewed _id of the results, knowing that I'm using mongoose?
This is the procedure I wrote:
o = {};
o.map = function() {
//log10(likes+comments) / elapsed hours from the post creation
emit(Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1), this);
};
o.reduce = function(key, values) {
//sort the values, when they have the same score
values.sort(function(a, b) {
a.createdAt - b.createdAt;
});
//serialize the values, because mongoose does not support multiple returned values
return JSON.stringify(values);
};
o.scope = {now: new Date()};
o.limit = 15;
Posts.mapReduce(o, function(err, results) {
if (err) return console.log(err);
console.log(results);
});
Also, if the mapReduce it's not the way to go, do you suggest other on how to implement something like this?

What you need is a page delimiter which is not the id of the latest viewed as you say, but your sorting property. In this case, it seems to be the formula Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1).
So, in your mapReduce query needs to hold a where value of that formula above. Or specifically, 'formula >= . And also it needs to hold the value of createdAt at the last page, since you don't sort by that. (Assuming createdAt is unique). So yourqueryof mapReduce would saywhere: theFormulaExpression, createdAt: { $lt: lastCreatedAt }`
If you do allow multiple identical createdAt values, you have to play a little outside of the database itself.
So you just search by formula.
Ideally, that gives you one element with exactly that value, and the next ones sorted after that. So in reply to the module caller, remove this first element off the array (and make sure you actually ask for more results then you need because of this).
Now, since you allow for multiple similar values, you need another identifying prop, say, object id or created_at. Your consumer (caller of this module) will have to provide both (last value of the score, createdAt of the last object). Say you have a page split exactly in the middle - one or more objects is on the previous page, another set on the next
. You'd have to not simply remove the top value (because that same score is already served on the previous page), but possibly several of them from the top.
Then it goes really crazy, because potentially your whole page was already served - compare the _ids, look for the first one after the one your module caller has provided you with. Or look into the data and determine how many matching values like that are there, try to get at least as many more values from mapReduce then you have on your actual page size.
Aside from that, I would do this with aggregation instead, it should be much more preformant.

How to generate short unique names for uploaded files in nodejs

I need to name uploaded files by short unique identifier like nYrnfYEv a4vhAoFG hwX6aOr7. How could I ensure uniqueness of files?

Update: shortid is deprecated. Use Nano ID instead. The answer below applies to Nano ID as well.
(Posting my comments as answer, with responses to your concerns)
You may want to check out the shortid NPM module, which generates short ids (shockingly, I know :) ) similar to the ones you were posting as example. The result is configurable, but by default it's a string between 7 and 14 characters (length is random too), all URL-friendly (A-Za-z0-9\_\- in a regex).
To answer your (and other posters') concerns:
Unless your server has a true random number generator (highly unlikely), every solution will use a PRNG (Pseudo-Random Number Generator). shortid uses Node.js crypto module to generate PRNG numbers, however, which is a much better generator than Math.random()
shortid's are not sequential, which makes it even harder to guess them
While shortid's are not guaranteed to be unique, the likelihood of a collision is extremely small. Unless you generate billions of entries per year, you could safely assume that a collision will never happen.
For most cases, relying on probability to trust that collisions won't happen is enough. If your data is too important to risk even that tiny amount, you could make the shortid basically 100% unique by just prepending a timestamp to it. As an additional benefit, the file names will be harder to guess too. (Note: I wrote "basically 100% unique" because you could still, in theory, have a collision if two items are generated in the same timestamp, i.e. the same second. However, I would never be concerned of this. To have a real 100% certainty your only option is to run a check against a database or the filesystem, but that requires more resources.)
The reason why shortid doesn't do that by itself is because for most applications the likelihood of a collision is too small to be a concern, and it's more important to have the shortest possible ids.

One option could be to generate unique identifiers (UUID) and rename the file(s) accordingly.
Have a look at the kelektiv/node-uuid npm module.
EXAMPLE:
$ npm install uuid
...then in your JavaScript file:
const uuidv4 = require('uuid/v4'); // I chose v4 ‒ you can select others
var filename = uuidv4(); // '110ec58a-a0f2-4ac4-8393-c866d813b8d1'
Any time you execute uuidv4() you'll get a very-fresh-new-one.
NOTICE: There are other choices/types of UUIDs. Read the module's documentation to familiarize with those.

Very simple code. produce a filename almost unique
or if that's not enough you check if the file exists
function getRandomFileName() {
var timestamp = new Date().toISOString().replace(/[-:.]/g,"");
var random = ("" + Math.random()).substring(2, 8);
var random_number = timestamp+random;
return random_number;
}

export default generateRandom = () => Math.random().toString(36).substring(2, 15) + Math.random().toString(23).substring(2, 5);
As simple as that!

function uniqueFileName( filePath, stub)
{
let id = 0;
let test = path.join(filePath, stub + id++);
while (fs.existsSync(test))
{
test = path.join(filePath, stub + id++);
}
return test;
}

I think you might be confused about true-random and pseudo-random.
Pseudo-random strings 'typically exhibit stastical randomness while being generated by an entirely deterministic casual process'. What this means is, if you are using these random values as entropy in a cryptographic application, you do not want to use a pseudo-random generator.
For your use, however, I believe it will be fine - just check for potential (highly unlikely) clashes.
All you are wanting to do is create a random string - not ensure it is 100% secure and completely random.

Try following snippet:-
function getRandomSalt() {
var milliseconds = new Date().getTime();
var timestamp = (milliseconds.toString()).substring(9, 13)
var random = ("" + Math.random()).substring(2, 8);
var random_number = timestamp+random; // string will be unique because timestamp never repeat itself
var random_string = base64_encode(random_number).substring(2, 8); // you can set size here of return string
var return_string = '';
var Exp = /((^[0-9]+[a-z]+)|(^[a-z]+[0-9]+))+[0-9a-z]+$/i;
if (random_string.match(Exp)) { //check here whether string is alphanumeric or not
return_string = random_string;
} else {
return getRandomSalt(); // call recursivley again
}
return return_string;
}
File name might have an alphanumeric name with uniqueness according to your requirement. Unique name based on the concept of timestamp of current time because current time never repeat itself in future and to make it strong i have applied a base64encode which will be convert it into alphanumeric.
var file = req.files.profile_image;
var tmp_path = file.path;
var fileName = file.name;
var file_ext = fileName.substr((Math.max(0, fileName.lastIndexOf(".")) || Infinity) + 1);
var newFileName = getRandomSalt() + '.' + file_ext;
Thanks

How to maintain counters with LinqToObjects?

I have the following c# code:
private XElement BuildXmlBlob(string id, Part part, out int counter)
{
// return some unique xml particular to the parameters passed
// remember to increment the counter also before returning.
}
Which is called by:
var counter = 0;
result.AddRange(from rec in listOfRecordings
from par in rec.Parts
let id = GetId("mods", rec.CKey + par.UniqueId)
select BuildXmlBlob(id, par, counter));
Above code samples are symbolic of what I am trying to achieve.
According to the Eric Lippert, the out keyword and linq does not mix. OK fair enough but can someone help me refactor the above so it does work? A colleague at work mentioned accumulator and aggregate functions but I am novice to Linq and my google searches were bearing any real fruit so I thought I would ask here :).
To Clarify:
I am counting the number of parts I might have which could be any number of them each time the code is called. So every time the BuildXmlBlob() method is called, the resulting xml produced will have a unique element in there denoting the 'partNumber'.
So if the counter is currently on 7, that means we are processing 7th part so far!! That means XML returned from BuildXmlBlob() will have the counter value embedded in there somewhere. That's why I need it somehow to be passed and incremented every time the BuildXmlBlob() is called per run through.

If you want to keep this purely in LINQ and you need to maintain a running count for use within your queries, the cleanest way to do so would be to make use of the Select() overloads that includes the index in the query to get the current index.
In this case, it would be cleaner to do a query which collects the inputs first, then use the overload to do the projection.
var inputs =
from recording in listOfRecordings
from part in recording.Parts
select new
{
Id = GetId("mods", recording.CKey + part.UniqueId),
Part = part,
};
result.AddRange(inputs.Select((x, i) => BuildXmlBlob(x.Id, x.Part, i)));
Then you wouldn't need to use the out/ref parameter.
XElement BuildXmlBlob(string id, Part part, int counter)
{
// implementation
}

Below is what I managed to figure out on my own:.
result.AddRange(listOfRecordings.SelectMany(rec => rec.Parts, (rec, par) => new {rec, par})
.Select(#t => new
{
#t,
Id = GetStructMapItemId("mods", #t.rec.CKey + #t.par.UniqueId)
})
.Select((#t, i) => BuildPartsDmdSec(#t.Id, #t.#t.par, i)));
I used resharper to convert it into a method chain which constructed the basics for what I needed and then i simply tacked on the select statement right at the end.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string