Top Algorithms Used by Data Scientists

Latest KDnuggets poll identifies the list of top algorithms actually used by Data Scientists, finds surprises including the most academic and most industry-oriented algorithms.

Latest KDnuggets Poll asked

Which methods/algorithms you used in the past 12 months for an actual Data Science-related application?

.

Here are the results, based on 844 voters.

The top 10 algorithms and their share of voters are:

Top 10 Algorithms Data Scientists Used
Fig. 1: Top 10 algorithms used by Data Scientists.

See full table of all algorithms at the end of the post.

The average respondent used 8.1 algorithms, a big increase vs a similar poll in 2011.

Comparing with 2011 Poll

Algorithms for data analysis / data mining

we note that the top methods are still Regression, Clustering, Decision Trees/Rules, and Visualization. The biggest relative increases,
measured by (pct2016 /pct2011 – 1) are for

  • Boosting, up 40% to 32.8% share in 2016 from 23.5% share in 2011
  • Text Mining, up 30% to 35.9% from 27.7%
  • Visualization, up 27% to 48.7% from 38.3%
  • Time series/Sequence analysis, up 25% to 37.0% from 29.6%
  • Anomaly/Deviation detection, up 19% to 19.5% from 16.4%
  • Ensemble methods, up 19% to 33.6% from 28.3%
  • SVM, up 18% to 33.6% from 28.6%
  • Regression, up 16% to 67.1% from 57.9%

Most popular among new options added in 2016 are

  • K-nearest neighbors, 46% share
  • PCA, 43%
  • Random Forests, 38%
  • Optimization, 24%
  • Neural networks – Deep Learning, 19%
  • Singular Value Decomposition, 16%

The biggest declines are for

  • Association rules, down 47% to 15.3% from 28.6%
  • Uplift modeling, down 36% to 3.1% from 4.8% (that is a surprise, given strong results published)
  • Factor Analysis, down 24% to 14.2% from 18.6%
  • Survival Analysis, down 15% to 7.9% from 9.3%

The following table shows usage of different algorithms types:
Supervised, Unsupervised, Meta, and other by Employment type.
We excluded NA (4.5%) and Other (3%) employment types.

Table 1: Algorithm usage by Employment Type

Employment Type % Voters Avg Num Algorithms Used % Used Super-
vised
% Used Unsuper-
vised
% Used Meta % Used Other Methods
Industry 59% 8.4 94% 81% 55% 83%
Government/Non-profit 4.1% 9.5 91% 89% 49% 89%
Student 16% 8.1 94% 76% 47% 77%
Academia 12% 7.2 95% 81% 44% 77%
All 8.3 94% 82% 48% 81%

We note that almost

everyone uses supervised learning algorithms

.

Government and Industry Data Scientists used

more different types of algorithms

than students or academic researchers,

and

Industry Data Scientists were more likely to use Meta-algorithms

.

Next, we analyzed the usage of top 10 algorithms + Deep Learning by employment type.

Table 2: Top 10 Algorithms + Deep Learning usage by Employment Type

Algorithm Industry Government/Non-profit Academia Student All
Regression 71% 63% 51% 64% 67%
Clustering 58% 63% 51% 58% 57%
Decision 59% 63% 38% 57% 55%
Visualization 55% 71% 28% 47% 49%
K-NN 46% 54% 48% 47% 46%
PCA 43% 57% 48% 40% 43%
Statistics 47% 49% 37% 36% 43%
Random Forests 40% 40% 29% 36% 38%
Time series 42% 54% 26% 24% 37%
Text Mining 36% 40% 33% 38% 36%
Deep Learning 18% 9% 24% 19% 19%

To make the differences easier to see, we compute the algorithm bias for a particular employment type relative to average algorithm usage as Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All) – 1.

Top Algorithms Bias Employment
Fig. 2: Algorithm usage bias by Employment.

We note that Industry/Government are more likely to use Visualization and Time Series,

Next, we look at regional participation which was representative of overall KDnuggets visitors.

from KDnuggets http://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html