Ok, I have tried searching the web for different modules for me to use but it didn't make me any wiser. There where so many different alternatives and I couldn't find any good discussion on wich was the better.
I need a hashmap with a 5 digit decimal number as key, and with arrays as values.
I need to effectively iterate through its keys as well a couple of times per second.
Anyone have a good recommendation of what I should use?
Thank you!
//Svennisen
I realised I do not need a module at all.
Since in javascript an array with a string index is automatically considered a map with key,value instead of index,value.
So I just convert my number to a string and use an ordinary vector.
Related
I have a requirement where I want to compare 2 identical excel/ppt/csv files which may have exact same content but may be created at different point in time.
I want to compare only the file contents in whatever manner possible using any nodejs package.
But I couldn't figure out how it can be done in an easier way either by stream comparison or even buffer comparison also didn't help.
I've done more research but not much success and I'm just wondering how it would be possible to ignore certain things such as time stamp and any other metadata while doing comparison and only consider contents to match up.
I've tried stream-compare, stream-equal, file-compare, buff1.equals(buff2) and few others but nine of them seem to have worked for my requirement.
But I didn't find any node package on the web which does what I am looking for.
Any insights or any suggestions as how it can be achieved?
Thanks in advance any help would be appreciated.
Search for a package that computes a hash on the document, for example crypto, calculate hashes (sha256) for 2 docs and compare them. If hashes match, document content will be the same (there is still a chance of hash collision, but it depends on the hash algorytm that are you using, sha256 will give you a decent confidence that documents are identical). Check this thread for more details: Obtaining the hash of a file using the stream capabilities of crypto module (ie: without hash.update and hash.digest)
I have a list of small strings, and I would like to quickly compress them. What is a good approach to do this? The strings don't have any other properties, aside from having ~13 million strings with sizes from 5 - 30 characters length.
Update: From the comments, these are sent over a network, used for a join so I don't know the specific properties, order doesn't matter, and I am sending them in bulk.
From your comments, you don't need to be able to decompress an individual small string.
Sorting the strings prior to using the standard compression/decompression method you can most easily use should go a long way.
Measure the difference in effect, report welcome!
as compressed as possible is dangerous as any "optimisation".
Fix a goal upfront and a way to tell not there from good enough, and move on once achieved.
I often times use filepaths to provide some sort of unique id for some software system. Is there any way to take a filepath and turn it into a unique integer in relatively quick (computationally) way?
I am ok with larger integers. This would have to be a pretty nifty algorithm as far as I can tell, but would be very useful in some cases.
Anybody know if such a thing exists?
You could try the inode number:
fs.statSync(filename).ino
#djones's suggestion of the inode number is good if the program is only running on one machine and you don't care about a new file duplicating the id of an old, deleted one. Inode numbers are re-used.
Another simple approach is hashing the path to a big integer space. E.g. using a 128 bit murmurhash (in Java I'd use the Guava Hashing class; there are several js ports), the chance of a collision among a billion paths is still 1/2^96. If you're really paranoid, keep a set of the hash values you've already used and rehash on collision.
This is just my comment turned to an answer.
If you run it in the memory, you can use one of standard hashmaps in your corresponding language. Not just for file names, but for any similar situation. Normally, hashmaps in different programming languages are satisfying collisions by buckets, so the hash number and the corresponding bucket number will provide a unique id.
Btw, it is not hard to write your own hashmap, such that you have control on the underlying structure (e.g. to retrieve the number etc).
Currently in my Node.Js app I use node-uuid module for giving unique IDs to my database objects.
Using uuid.v1() function from that module I get something like
81a0b3d0-e4d0-11e3-ac56-73f5c88681be
Now, my requests are quite long, sometimes hundreds of nodes and edges in one query. So you can imagine they become huge, because every node and edge has to have a unique ID.
Do you know if I could use a shorter ID system in order to not run into any problems after the number of my items grow? I mean I know I could get away with just the first 8 symbols (as there are 36^8 > 2 Trl combinations, but how well will it perform when they are randomly generated? As the number of my nodes increase, what is the chance that the randomly generated ID will not fall into the already existing ones?
Thank you!
If you're really concerned about uuid collisions you can always simply do a lookup to make sure you don't already have a row with that uuid. The point is that there is always a very low but non-zero chance of a collision given current uuid generators, especially with shorter strings.
Here's a post that discusses it in more detail
https://softwareengineering.stackexchange.com/questions/130261/uuid-collisions
One alternative would be to use a sequential ID system (autoincrement) instead of uuids.
I looked at tinyurl, tinypic, imgur and youtube! I thought they would use a text safe representation of a index and use it as a primary ID in their DB. However trying to put the keys into Convert.FromBase64String("key") yields no results and throw an exception. So these sites dont use a base64 array. What are they using? What might i want to use if i were to do a youtube like site or tinyurl?
im guessing they have developed their own encoding which is simply an alphanumeric equivalent of the ids in their database. im sure they dont generate random strings simply because this will cause catastrophic overflows at a certain point
I don't know about TinyURL, Tinypic, etc. but shorl.com uses something called koremutake.
If I were to develop such a system, I guess some sort of short hash or plain random strings could be possible choices.
My guess is that they simply generate a random string and use that as the primary key. I don't really see a reason to do anything else.