Latest KDnuggets poll identifies the list of top algorithms actually used by Data Scientists, finds surprises including the most academic and most industry-oriented algorithms.
Latest KDnuggets Poll asked
.
Here are the results, based on 844 voters.
The top 10 algorithms and their share of voters are:
Fig. 1: Top 10 algorithms used by Data Scientists.
See full table of all algorithms at the end of the post.
The average respondent used 8.1 algorithms, a big increase vs a similar poll in 2011.
Comparing with 2011 Poll
Algorithms for data analysis / data mining
we note that the top methods are still Regression, Clustering, Decision Trees/Rules, and Visualization. The biggest relative increases,
measured by (pct2016 /pct2011 – 1) are for
- Boosting, up 40% to 32.8% share in 2016 from 23.5% share in 2011
- Text Mining, up 30% to 35.9% from 27.7%
- Visualization, up 27% to 48.7% from 38.3%
- Time series/Sequence analysis, up 25% to 37.0% from 29.6%
- Anomaly/Deviation detection, up 19% to 19.5% from 16.4%
- Ensemble methods, up 19% to 33.6% from 28.3%
- SVM, up 18% to 33.6% from 28.6%
- Regression, up 16% to 67.1% from 57.9%
Most popular among new options added in 2016 are
- K-nearest neighbors, 46% share
- PCA, 43%
- Random Forests, 38%
- Optimization, 24%
- Neural networks – Deep Learning, 19%
- Singular Value Decomposition, 16%
The biggest declines are for
- Association rules, down 47% to 15.3% from 28.6%
- Uplift modeling, down 36% to 3.1% from 4.8% (that is a surprise, given strong results published)
- Factor Analysis, down 24% to 14.2% from 18.6%
- Survival Analysis, down 15% to 7.9% from 9.3%
The following table shows usage of different algorithms types:
Supervised, Unsupervised, Meta, and other by Employment type.
We excluded NA (4.5%) and Other (3%) employment types.
Table 1: Algorithm usage by Employment Type
Employment Type | % Voters | Avg Num Algorithms Used | % Used Super- vised |
% Used Unsuper- vised |
% Used Meta | % Used Other Methods |
---|---|---|---|---|---|---|
Industry | 59% | 8.4 | 94% | 81% | 55% | 83% |
Government/Non-profit | 4.1% | 9.5 | 91% | 89% | 49% | 89% |
Student | 16% | 8.1 | 94% | 76% | 47% | 77% |
Academia | 12% | 7.2 | 95% | 81% | 44% | 77% |
All | 8.3 | 94% | 82% | 48% | 81% |
We note that almost
everyone uses supervised learning algorithms
.
Government and Industry Data Scientists used
more different types of algorithms
than students or academic researchers,
and
Industry Data Scientists were more likely to use Meta-algorithms
.
Next, we analyzed the usage of top 10 algorithms + Deep Learning by employment type.
Table 2: Top 10 Algorithms + Deep Learning usage by Employment Type
Algorithm | Industry | Government/Non-profit | Academia | Student | All |
---|---|---|---|---|---|
Regression | 71% | 63% | 51% | 64% | 67% |
Clustering | 58% | 63% | 51% | 58% | 57% |
Decision | 59% | 63% | 38% | 57% | 55% |
Visualization | 55% | 71% | 28% | 47% | 49% |
K-NN | 46% | 54% | 48% | 47% | 46% |
PCA | 43% | 57% | 48% | 40% | 43% |
Statistics | 47% | 49% | 37% | 36% | 43% |
Random Forests | 40% | 40% | 29% | 36% | 38% |
Time series | 42% | 54% | 26% | 24% | 37% |
Text Mining | 36% | 40% | 33% | 38% | 36% |
Deep Learning | 18% | 9% | 24% | 19% | 19% |
To make the differences easier to see, we compute the algorithm bias for a particular employment type relative to average algorithm usage as Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All) – 1.
Fig. 2: Algorithm usage bias by Employment.
We note that Industry/Government are more likely to use Visualization and Time Series,
Next, we look at regional participation which was representative of overall KDnuggets visitors.
from KDnuggets http://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html