This site is intended to give an overview of software tools for data analytics and machine learning.
With the new hype about big data, it has become difficult to decide between all the tools and methods that are available as open source or commercial software packages. In the following sections, I will present the most important companies, databases, big data storage engines, and analytics pipelines to compare their features and limitations. This may help to select the appropriate tools for a particular use case among diffenent alternatives.
Here are some links to other resources on the web:
Here is a list of companies practicing data analytics that you should know about:
Amazon uses big data within the whole company. One of their most innovative products is the smart speaker Echo which can be used to interact with their personal assistant Alexa.
Cloudera offers the customized Linux distribution CDH (Cloudera Distribution Including Apache Hadoop).
Datameer focuses on big data analytics and visualization on top of Hadoop.
DeepMind uses neural networks to build intelligent software and solve complex problems. Their software AlphaGo was the first to beat a professional human player in the game Go.
Here is a list of data analytics tools:
A library for data mining which runs on top of Hadoop.
Relational Database
Microsoft
Wide column Database
Relational Database
Microsoft
Relational Database
Wide column Database
Oracle
Key-value Database
Document Database
Document Database
Relational Database
Relational Database
Relational Database
Document Database
Amazon
Key-value Database
Search Engine
Relational Database
Relational Database
Relational Database
SAP
Hazelcast provides implementations for Java Collections whose entries are being distributed within the cluster.
Key-value Database
Clustered
Replication
No master (peer to peer)
Wide column Database
Relational Database
Relational Database
Relational Database
Relational Database
Relational Database
Key-value Database
Document Database
Relational Database
Graph Database
Relational Database
Relational Database
Key-value Database
Relational Database
Key-value Database
Amazon
Search Engine
Relational Database
Search Engine
Relational Database
Relational Database
Microsoft
Relational Database
Hadoop Distributed File System
A data processing concept which can be used to process large amounts of data in a parallel way.
Online Analytical Processing (Data Warehouse)
Online Transactional Processing (Operational System)
Relational database management system