What is 3G & 4G of Big Data mean and the different? - apache-spark

I've read a page about the comparison between Apache Spark and Apache Flink.
I don't know what the 3G & 4G of Big Data mean.
Please explain to me!

Means 3rd Generation, 4th Generation. There are many publications and websites that use these 3G or 4G terms to highlight or denigrate some technology by assigning a certain "generation". Each tool have things for and against according to the problem you are facing. From hadoop to Flink (there are many more Zamza, Spark, Storm ...) each has brought something new to the world of Big Data:
Calculation on huge volumes of data
Easy to use
Support for efficient iterative calculation
Unification of batch and streaming APIs
Support for CEP
Full streaming processing
Complete compatibility with the hadoop ecosystem
Exactly-once processing guarantees
...
What others have recommended is true. You should not be guided by these 3G or 4G criteria to select a technology. You must study your problem fully, know the technologies and tools available or at least have them classified according to their philosophy and use case Something old but illustrative is this book
You will form an idea and classify each one according to your own criteria :)
Something is true: each tool comes first or later and each stands out because it contains a different or more appropriate approach to certain problems

Related

Dactyl-ManuForm vs Kinesis Advantage 360

I am in the market for a split orthonormal ergonomic keyboard. I have an issue with my shoulder that gets worse when I type.
I bought an Ergodox EZ and while I thought it was a definite improvement I ended up returning it. After doing more research I learned that I should have gotten the modeled keycaps, but I got the backlit Ergodox EZ which doesn't support these keycaps and the molded keycaps on the Ergodox was a worse version of what the Dactyl-ManuForm and Kinesis Advantage 360 were delivering with their truly concaved form factors.
I am trying to determine which is the "best" ergonomic keyboard for me. I am leaning towards the Dactyl-ManuForm. But I don't have the time, means, or interest to build my own. I am looking at this option: https://taikohub.com/products/dactyl-manuform-keyboard-v2 but I am curious whether the 5 Keyed Thumb Cluster or the 6 Keyed Thumb Cluster is a better option. Also, I am currently in Los Angeles, does anyone know another option (around here or otherwise) for configuring and buying a prebuilt Dactyl-ManuForm?
It looks like the Kinesis Advantage 360 is another good option that delivers a similar form factor. Has anyone tried both and can they recommend one over the other?
Lastly, I really liked that the Ergodox EZ had the ORYX software which allowed you not only to configure the keyboard but also a series of training exercises. I am not a great touch typer but it is something I will need to get skilled with in order to properly use these keyboards (especially if I get a dactyl without markings on the keycaps, like the rebuilt one I linked). Is there an analogous software training tool for the Dactyl-ManuForm and/or Advantage 360?
Any insights or recommendations would be greatly appreciated.

WMS layers rendering slow in Open layers 3?

I have Geoserver 2.11.2,PostgreSQL 9.5,open layers 3 and Tomcat 8 all are installed on Ubuntu 16.4 Azure cloud virtual machine.I also enabled GeoWebCache but still WMS layer rendering speed is slow(15 to 16 seconds).Please find this .Is there any idea to improve more speed than the current speed of web-tool,Thanks.
Broadly, it sounds like something is misconfigured. There are some excellent resources in the GeoServer docs (http://docs.geoserver.org/stable/en/user/production/) about running in production. From GeoSolutions, there is some training materials (http://geoserver.geo-solutions.it/edu/en/enterprise/index.html) and talks (https://www.slideshare.net/geosolutions/geoserver-in-production-we-do-it-here-is-how-foss4g-2016) which address common techniques for data prep, JVM options, and other considerations which may help some.
As a particular call-out, I'd strongly suggest Marlin (https://github.com/bourgesl/marlin-renderer/wiki/How-to-use). Its use in GeoServer can help immensely with concurrent rendering (http://www.geo-solutions.it/blog/developerss-corner-achieving-extreme-geoserver-scalability-with-the-new-marlin-vector-rasterizer/).
It may be worth making sure that PostGIS is installed and that your data has a spatial index. Tuning PostGIS is a separate topic.
Once the data is prepped and indexed and Marlin is up and running, it may be worth seeding the GWC cache. With that, your application would just be serving pre-baked tiles for coarse zoom levels and that should be snappier.
Looks like you have a lot of layers turned on in your map. Just zooming in once triggered a total of 700 individual tile requests, most of them to your GeoServer. I don't think your main problem is your GeoServer (although tweaking it using the other answers suggestions is always a good idea), I think your main problem is simply throughput.
Most browsers have a limit (when using http 1.1) of how many simultaneous requests can be sent to the same domain, once you hit that limit, all other requests are queued until the previous ones are done. I think that's your problem, your server is dealing with the requests as quickly as it can, but there are so many that it simply cannot serve them at the speed you are expecting.
I would strongly recommend you look at reducing the number of layers you have loaded by default, or implement some kind of zoom restriction so that certain layers turn off at different zoom levels. You could even think about combining a number of the layers into one and perhaps using GeoServers CQL filtering to change what is displayed.

Tools for parsing natural language questions in realtime

photos in washington VS show me photos in washington VS I wanna see all my photos in washington taken day before yesterday
what:photos
entities:washington (dont want to be too assuming)
when: 2013-03-14
I want to parse preset queries into conditions (like above). I want these qualities:
I can extract relevant terms even in presence of fluff ("I wanna see) and lowercase nouns
warm program can accept requests over HTTP or allow me to add some network communication
warm program responds in 50ms and needs atmost 500Mb of memory for reasonable sentences
I am more experienced in Python, less so in Java
Parser data structure is easy to handle
I use NLTK, but its slow. I see StanfordNLP and OpenNLP as viable alternatives but I find the program-start latency to be too high. I dont mind integrating them over servlets if I am left with no alternative.
The Stanford Parser is a solid choice, and pretty well-supported (as research code goes). But it sounds like low latency is an important requirement for you, so I'd also suggest you look at the BUBS Parser (full disclosure - I'm one of the primary researchers working on BUBS).
I haven't compared directly to NLTK, but I think you may find that the Stanford Parser doesn't meet your performance needs. This paper found a total throughput of ~60 words/second (~2-3 sentences/second). Those timings are pretty old, so newer hardware will certainly improve that, but probably still won't come close to 50 ms latency.
As you note, startup time will be an issue with any parser - a high-accuracy model is necessarily quite large. And 500 MB is probably pretty tight too (I usually run BUBS with 1-1.2 GB). But once loaded, BUBS latency is generally in the neighborhood of 10 ms per sentence (for ~20-25-word sentences), and we can push the total throughput up around 2500 words/second before accuracy starts to drop off. I think those numbers might meet your performance needs, and I don't know of any other high-accuracy (F1 >= 88-89) parser that comes close in speed.
Note: the fastest results are with recent pruning models that aren't yet posted to the website, but I can get you a model if you need. Hope that helps, and if you have more questions, feel free to ask.

Software to visually represent telecommunications networks

I am working on some software tools used for the design and optimization of telecoms networks: routing, capacity allocation and topology.
To represent the network nodes and interconnecting links I currently use standard MFC calls to draw things like lines and ellipses using mouse clicks, menu commands and so on.
For the time being this has been an adequate means of graphically representating what the
network looks like, as I have been more concerned with getting the underlying algorithms right, improving efficiencies and so on.
At some stage I will want to improve the look and feel of the software. Is anyone aware of any GUI software that is particularly suited to this purpose, open source or otherwise, that would be suitable for the network building stage? The intention is to use something that is more slick than what I am currently doing, especially when it comes to (say) dragging nodes into the drawing area and setting their properties. Also of interest would be graphics to display bar charts representing link utilization levels.
Thanks in anticipation.
For looks, I think you must go for WPF. But the migration from MFC wont be that easy if you have a huge code base. Although GDI is said to be good, performance wise.

DAM Solutions for full-length movies

Are there any DAM (Digital Asset Management) solutions in the market that can handle storage & classification of full length movies (10 GB and greater).
Have heard of solutions from EMC Documentum, Artesia, Oracle UCM and the like but not sure if they handle file sizes this large ? Any open-source systems ?
I am going to go out on a a limb and say 'No'. There may be some custom ones around, but I have not seen anything that could handle videos of that size.
Personally, I have implemented the image portion of Oracle's DAM (still owned by Stellent at the time). I remember the tech being optimized for short streaming videos.
In Aust, ABC recently launched a streaming service;
http://www.abc.net.au/iview/
This, like other I have seen that are similar, seem to be limited to episodes or shows limited to 1/2 or single hour blocks.
Actually, 10gb seems like a crazy size to be entered into a CM system. As CM implies the files will be shared/ available remotely.
Are the videos uncompressed?
Do you want to stream/provide them across the network?
I would be interested to know some more details on the system you are after.

Resources