Compressing a string [closed] - string

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There is a frequently asked question in interviews about compressing a string.
I'm not looking for a code, I only need an efficient algorithm that solves the problem.
Given a string (e.g. aaabbccaaadd), compress it (3a2b2c3a2d).
My solution:
Travel on the string. Every time I see the same letter I count it.
I will output the letter and the counter when I see a different letter coming (and start over again).
Is there more efficient way to do this?
Thanks

That's called running length encoding, and the algorithm you name is basically the best you'll get. It takes O(1) auxiliary storage (save the last symbol seen, or equivalently inspect the upcoming element; also save a counter of how many identical symbols you've seen) and runs in O(n) time. As you need to inspect each symbol at least once to know the result, you can't get better than O(n) time anyway. What's more, it can also process streams one symbol at a time, and output one symbol at a time, so you actually only need O(1) RAM.
You can pull a number of tricks to get the constant factors better, but the algorithm remains basically the same. Such tricks include:
If you stream to a slow destination (like disk or network), buffer. Extensively.
If you expect long runs of identical symbols, you may be able to vectorize the loop counting them, or at least make that loop tighter by moving out the other cases.
If applicable, tell your compiler not to worry about aliasing between input and output pointers.
Such micro-optimizations may be moot if your data source is slow. For the level of optimization some of my points above address, even RAM can counts as slow.

Use Lempel Ziv compression if your string will be sufficiently long.. The advantage is: it will not only shorten distinct repetitions but also 'groups' of repetitions efficiently. See wikipedia: Lempel-Ziv-Welch
A vague example - so that you get the idea:
aaabqxyzaaatuoiaaabhaaabi will be compressed as:
AbqxyzAtuiBhBi
where [A = aaa] & [B = Ab = aaab]

many compression algorithms are based on Huffman Coding. That's the answer I'd give in an interview

Related

Programming Wavelets for Audio Identification [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
How exactly is a wavelet used digitally?
Wikipedia states
"a wavelet could be created to have a frequency of Middle C and a
short duration of roughly a 32nd note"
Would this be a data structure holding e.g {sampleNumber, frequency} pairs?
If a wavelet is an array of these pairs, how is it applied to the audio data?
How does this wavelet apply to the analysis when using an FFT?
What is actually being compared to identify the signal?
I feel like you've conflated a few different concepts here. The first confusing part is this:
Would this be a data structure holding e.g {sampleNumber, frequency} pairs?
It's a continuous function, so pick your favourite way of representing continuous functions in a discrete computer memory, and that might be a sensible way to represent it.
The wavelet is applied to the audio signal by convolution (this is actually the next paragraph in the Wikipedia article you referenced...), as is relatively standard in most DSP applications (particularly audio-based applications). Wavelets are really just a particular kind of filter in the broader signal-processing sense, in that they have particular properties that are desirable in some applications, but they are still fundamentally just filters!
As for the comparison being performed - it's the presence or absence of a particular frequency in the input signal corresponding to the frequency (or frequencies) that the wavelet is designed to identify.

Best program to analyze data from cycle tests [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've performed some cycle tests of steel joints. The tests conditions include the application of 3 cycles per amplitude value and three different amplitudes were used.
Now I have a huge text file with rotation and moment values but I need to determine the stiffness of each branch of the diagram with a regression analysis method. Therefore I need to separate each cycle.
Do you recommend
Mathematica,
Matlab,
Excel,
or other program best suited to make this task easier?
Many thanks as always for your advice.
It's not entirely clear what you're looking for in the question. I also don't know much about Mathematica or Excel, but I'll say as much as I can about how Matlab might be used to address this problem.
When you say 'separate each cycle', I assume you mean that your text file contains data about all 3 cycles and you want to partition it into 3 separate datasets regarding each individual cycle. I would guess that Matlab will import your data file (the file->import data menu is quite flexible, and I've used it successfully with e.g. 30MB files, but if your files are hundreds of MB that might be a problem).
Assuming there is some structure to the data file, I would expect that you can slice it to achieve your desired partition, e.g.
cycle1 = data(1:3:end, :); %If data from cycles are stored in alternate rows
cycle1 = data(1:end/3, :); %If data from cycles are stored in blocks of rows
cycle1 = data(:, 3); %If data from cycles are stored in separate columns
etc. If you comment with a description of structure of the file I may be able to help further.
Regarding regression analysis, Matlab has several tools; polyfit is quite flexible and might satisfy your requirements. I don't know anything about materials, but I may be able to give better suggestions if you explain the relationship between stiffness and the measures variables.
Mathematica is great, but in terms of the widest range of tools, I'd opt for R and perhaps it's glm package. There are many other suitable packages, perhaps even a neural network or random forest for regression might make an interesting alternative, all are freely available in R.

What if the gc were optional in go? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Would such a language be feasible or are there specific features in go that absolutely require some form of a gc?
note: I am not anti-gc, but coming from a C/C++ background and working on a real-time server application, I prefer to maintain some level of control how and when memory is reaped (can't have a 10s garbage-collection happening in the middle of a live run).
Are my concerns realistic, given my requirements? Or is the go gc so good that my concerns are unfounded?
Go's gc is my only reservation about attempting a port of my C++ real-time server to go.
Go with optional GC would require language changes. Here's a perfectly valid Go function that will make a C programmer's skin crawl:
func foo() *int {
a := 1
return &a
}
This is fine because the Go compiler will figure out that the variable a needs to be allocated on the heap. It will be garbage collected later and you don't have to care. (Well, ok, in some situations you might. But most of the time you don't.)
You can concoct all kinds of scenarios where the compiler will do things like this. It just wouldn't be the same without a garbage collector.
There are things you can do to help GC times, but to a certain extent you'll be nullifying the advantages of the language. I hesitate to recommend these practices, but are options:
Free lists
With the unsafe package you can even write your own allocator and manually free memory, but you'd need a function for every type you want to allocate. Or use reflection to pass in the type you want to allocate, return an empty interface, and use type assertions to get concrete values out.
The bottom line is, Go probably isn't a good choice for applications with hard real-time requirements. That said, I also don't think you'll see anything approaching a 10 second garbage collection. Consider what your requirements really are and if you have any doubts, do some measurements.
Try the latest Go code if you can. There are some garbage collector improvements and some compiler optimizations that cause fewer allocations. If your release time frame is shorter, though, you may be stuck with the current stable release for several months.

the meaning of lightweighted object [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Please, gurus, give me a detailed explanation of, in the Object-Oriented programming world, what is lightweight object? And in other computer science fields, what does lightweight means then? Is lightweight a design pattern? Is lightweight good, is bad?
There are many meanings for lightweight, but normally it means the object which has less amount of data or which process less amount of data. Sometimes a thread is called as a lightweight process as it does a less things than a process do. Its processing is also fast than the process. A lightweight object is one which has less amount members and which are of basic types (int, float) as member variables. A light function is the one which does very less things compared to others. Normally these are inline functions. (in C context).
There is no such patterns as lightweight pattern. But Normally the systems should be consists of lightweight objects so that the maintaining those objects could be easy.
The advantages are simple debugging, maintenance and easy understanding of code. The disadvantage could be lots of objects.
There is no lightweight pattern as such but the term is fairly used in the industry.
Lightweight X tend to be used in the case where we have a somewhat well known structure X. Lightweight X is then a version of X using fewer resources in some way or the other - or is subtly different from X in some way.
The term, as is the case for most computer science, is not well-defined and is loosely used.

Common programming mistakes for Haskell developers to avoid? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
In the spirit of the other common mistakes in questions, what are the most common mistakes that Haskell programmers make? I've been teaching myself Haskell for a little while and I am starting to feel comfortable enough with the language to start applying it in the real world.
The most common mistake I know of is introducing a space leak through lazy evaluation. There are lots of ways to achieve this mistake, but one that especially nails programmers with other functional-programming experience is to put a result in an accumulating parameter, thinking that the accumulating parameter will take constant space. In many cases the accumulating parameter takes linear space because parameters are not evaluated.
Another common mistake is to forget that let is always recursive. An unintentional
let x = ... x ...
can lead to baffling outcomes.
Most other common bad experiences manifest not as mistakes but as trouble getting programs past the type checker, or difficulty understanding the monadic I/O model. Difficulties with list comprehensions and with do notations occur occasionally.
In general the difficulties faced by beginning Haskell programmers include
Large language with many dark corners, especially in the type system
Trouble getting programs to compile, especially when they do I/O
Doing everything in the IO monad
Great difficulty predicting the time and space behavior of lazy functional programs
A common mistake for beginning Haskell programmers is to forget the difference between constructor and type namespaces. That was such a beginner's mistake that I'm about embarrassed to have my name attached to it, but I'm pretty confident that others will stumble upon that answer when they have a similar problem, so may as well keep it out there.
The difference between [] and [[]]: the empty list and the list with 1 element, namely the empty list. This one especially pops up in base cases of recursive functions.
Use non tail-recursive functions or not strict folds occurring to stack overflow.

Resources