Concurrently check if string is in slice?

Concurrently check if string is in slice? - string

Generally to check if a string is in slice I write a function with for loop and if statement. but it's really inefficient in cases of large slices of string or struct types. is it possible to make this check concurrent?

The concurrent search on sequential data is usually not a great idea, simply because we already have a binary search that scales really well for even billions of records. All you have to do to utilize it is build indexing on top of the slice you are searching in. To build the most trivial indexing, you have to save keys into another slice along with the index of data they are pointing to. Once you have the slice, just sort it by strings, and indexing is done.
You have to perform the binary search on the indexing you just created to be more efficient. This way you have the complexity of O(log N).
Another much simpler option you have is creating the map[string]int and inserting all keys along with the indexes. Then find the index inside the map. Which can be O(1) best case.
The important thing to note is that if you have to perform just one search on a given slice, this is not worth it as creating indexing is a lot heavier than linear search.

Related

What's the complexity of inserting to a vector in Rust?

How does insert work in a Rust Vec? How efficient is it to insert an element at the start of a VERY large vector which has billions of elements?

The documentation lists the complexities for all the standard collections and operations:
Throughout the documentation, we will follow a few conventions. For
all operations, the collection's size is denoted by n. If another
collection is involved in the operation, it contains m elements.
Operations which have an amortized cost are suffixed with a *.
Operations with an expected cost are suffixed with a ~.
get(i) insert(i) remove(i) append split_off(i)
Vec O(1) O(n-i)* O(n-i) O(m)* O(n-i)
The documentation for Vec::insert explains details, emphasis mine:
Inserts an element at position index within the vector, shifting all elements after it to the right.
How efficient is it to insert an element at the start of a VERY large vector which has billions of elements?
A VERY bad idea, as everything needs to be moved. Perhaps a VecDeque would be better (or finding a different algorithm).

Found this question and need to add a thing.
It all depends on your usage. If you're inserting once, it's maybe worth to accept that O(n). If you then do millions of get requests with O(1).
Other datatypes maybe have better insertion time but have O(log(n)) or even O(N) for getting items.
Next thing is iteration where cache friendlyness comes into play for such large arrays, where Vector is perfect.
May advice: if you're inserting once and then do lot of requests, stay with Vec.
If inserting and removing is your main task, like a queue, go for something else.
I often found myself in some situation where I need sorted arrays and then go for something like Btreemap, or BTreeSet. I removed them completely and used a Vec now, where after adding all values, I do a sort and a dedup.

Fast substring in scala

According to Time complexity of Java's substring(), java's substring takes linear time.
Is there a faster way (may be in some cases)?
I may suggest iterator, but suspect that it also takes O(n).
val s1: String = s.iterator.drop(5).mkString
But several operations on an iterator would be faster than same operations on string, right?

If you need to edit very long string, consider using data structure called Rope.
Scalaz library has Cord class which is implementation of modified version of Rope.
A Cord is a purely functional data structure for efficiently
storing and manipulating Strings that are potentially very long.
Very similar to Rope[Char], but with better constant factors and a
simpler interface since it's specialized for Strings.

As Strings are - according to the linked question - always backed by a unique character array, substring can't be faster than O(n). You need to copy the character data.
As for alternatives: there will at least be one operation which is O(n). In your example, that's mkString which collects the characters in the iterator and builds a string from them.
However, I wouldn't worry about that too much. The fact that you're using a high level language means (should mean) that developer time is more valuable than CPU time for your particular task. substring is also the canonical way to ... take a substring, so using it makes your program more readable.
EDIT: I also like this sentence (from this answer) a lot: O(n) is O(1) if n does not grow large. What I take away from this is: you shouldn't write inefficient code, but asymptotical efficiency is not the same as real-world efficiency.

Looking for an efficient array-like structure that supports "replace-one-member" and "append"

As an exercise I wrote an implementation of the longest increasing subsequence algorithm, initially in Python but I would like to translate this to Haskell. In a nutshell, the algorithm involves a fold over a list of integers, where the result of each iteration is an array of integers that is the result of either changing one element of or appending one element to the previous result.
Of course in Python you can just change one element of the array. In Haskell, you could rebuild the array while replacing one element at each iteration - but that seems wasteful (copying most of the array at each iteration).
In summary what I'm looking for is an efficient Haskell data structure that is an ordered collection of 'n' objects and supports the operations: lookup i, replace i foo, and append foo (where i is in [0..n-1]). Suggestions?

Perhaps the standard Seq type from Data.Sequence. It's not quite O(1), but it's pretty good:
index (your lookup) and adjust (your replace) are O(log(min(index, length - index)))
(><) (your append) is O(log(min(length1, length2)))
It's based on a tree structure (specifically, a 2-3 finger tree), so it should have good sharing properties (meaning that it won't copy the entire sequence for incremental modifications, and will perform them faster too). Note that Seqs are strict, unlike lists.

I would try to just use mutable arrays in this case, preferably in the ST monad.
The main advantages would be making the translation more straightforward and making things simple and efficient.
The disadvantage, of course, is losing on purity and composability. However I think this should not be such a big deal since I don't think there are many cases where you would like to keep intermediate algorithm states around.

Haskell arrays vs lists

I'm playing with Haskell and Project Euler's 23rd problem. After solving it with lists I went here where I saw some array work. This solution was much faster than mine.
So here's the question. When should I use arrays in Haskell? Is their performance better than lists' and in which cases?

The most obvious difference is the same as in other languages: arrays have O(1) lookup and lists have O(n). Attaching something to the head of a list (:) takes O(1); appending (++) takes O(n).
Arrays have some other potential advantages as well. You can have unboxed arrays, which means the entries are just stored contiguously in memory without having a pointer to each one (if I understand the concept correctly). This has several benefits--it takes less memory and could improve cache performance.
Modifying immutable arrays is difficult--you'd have to copy the entire array which takes O(n). If you're using mutable arrays, you can modify them in O(1) time, but you'd have to give up the advantages of having a purely functional solution.
Finally, lists are just much easier to work with if performance doesn't matter. For small amounts of data, I would always use a list.

And if you're doing much indexing as well as much updating,you can
use Maps (or IntMaps), O(log size) indexing and update, good enough for most uses, code easy to get right
or, if Maps are too slow, use mutable (unboxed) arrays (STUArray from Data.Array.ST or STVectors from the vector package; O(1) indexing and update, but the code is easier to get wrong and generally not as nice.
For specific uses, functions like accumArray give very good performance too (uses mutable arrays under the hood).

Arrays have O(1) indexing (this used to be part of the Haskell definition), whereas lists have O(n) indexing. On the other hand, any kind of modification of arrays is O(n) since it has to copy the array.
So if you're going to do a lot of indexing but little updating, use arrays.

Best way to sort a long list of strings

I would like to know the best way to sort a long list of strings wrt the time and space efficiency. I prefer time efficiency over space efficiency.
The strings can be numeric, alpha, alphanumeric etc. I am not interested in the sort behavior like alphanumeric sort v/s alphabetic sort just the sort itself.
Some ways below that I can think of.
Using code ex: .Net framework's Arrays.Sort() function. I think the way this works is that the hashcodes for the strings are calculated and the string is inserted at the proper position using a binary search.
Using the database (ex: MS-sql). I have not done this. I do not know how efficient this would be though.
Using a prefix tree data structure like a trie. Sorting requires traversing all the trieNodes of the trie tree using DFS (depth first search) - O(|V| + |E|) time. (Searching takes O(l) time where l is the length of the string to compare).
Any other ways or data structures?

You say that you have a database, and presumably the strings are stored in the database. Then you should get the database to do the work for you. It may be able to take advantage of an index and therefore not need to actually sort the list, but just read it from the index in sorted order.
If there is no index the database might still be able to help you. If you only fetch the first k rows for some small constant number k, for example 100. When you use ORDER BY with a LIMIT clause it allows SQL Server to use a special optimization called TOP N SORT which runs in linear time instead of O(n log(n)) time.
If your strings are not in the database already then you should use the features provided by .NET instead. I think it is unlikely you will be able to write custom code that will be much faster than the default sort.

I found this paper that uses trie data structure to efficiently sort large sets of strings. I have not looked into it in detail though.

Radix sort could also be good option if strings are not very long e.g. list of names

Let us suppose you have a large list of strings and that the length of the List is N.
Using a comparison based sorting algorithm like MergeSort, HeapSort or Quicksort will give you an
where n is the size of the list and d is the maximum length for all strings in the list.
We can try to use Radix sort in this case. Let b be the base and let d be the length of the maximum string then we can show that the running time using radix sort is .
Furthermore, if the strings are say the lower case English Alphabets the running time is
Source: MIT Opencourse Algorithms lecture by prof. Eric Demaine.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Concurrently check if string is in slice? - string

Generally to check if a string is in slice I write a function with for loop and if statement. but it's really inefficient in cases of large slices of string or struct types. is it possible to make this check concurrent?

Related

What's the complexity of inserting to a vector in Rust?

Fast substring in scala

Looking for an efficient array-like structure that supports "replace-one-member" and "append"

Haskell arrays vs lists

Best way to sort a long list of strings

Categories

Resources