Dictionary of Machine Learning Terms

We thought we would start a list of machine learning terms and terminology. Its as much benefit for the team at Black Belt Digital as it is for our readers. Here we go!

A/B testing

A statistical way of comparing two (or more) techniques, typically an incumbent against a new rival.

Apache Spark

A library for distributed computing for large-scale data manipulation and machine learning

Backpropogation

An algorithm for training neural networks in which errors are propagated backwards through the network

Bag of words

A representation of the words in a phrase or passage, irrespective of order. Different ways of writing the same sentence

Classification

A machine learning problem involving the prediction of two or more classes from an observation, classifying data enables the identification of the right mathematical model to analyse data.

Clustering

grouping data observations that are similar according to a given criteria

Confusion matrix

a table that summarizes how successful a classification model’s predictions were

Data Science

The field covering machine learning, data cleaning and preparation, and data analysis techniques such as visualisation.

Deep learning

structures algorithms in layers to create an “artificial neural network” that can learn and make intelligent decisions on its own, sub-set of machine learning

Graphic Processing Units

The use of graphics cards for high performance computing tasks as opposed to graphical tasks
Due to the number of individual cores, GPUs can process far more pictures and graphical data per second than a conventional CPU
This provides significant performance gains for tasks such as machine learning (eg; processing a large bank of images)

JupyterHub

With JupyterHub you can create a multi-user Hub which spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server
Project Jupyter created JupyterHub to support many users. The Hub can offer notebook servers to a class of students, a corporate data science workgroup, a scientific research project, or a high performance computing group

Kaggle

A data science competition, great way to test potential data scientists

Keras

Keras enables user-friendly and easy prototyping providing object oriented thinking and enabling the building of neural networks one layer at a time. In just the few lines of code you can create a sequential neural network with the standard bells and whistles like dropout.

Kubeflow

Kubeflow helps you build composable, portable, and scalable machine learning stacks. With Kubeflow, businesses can speed up the AI tools and framework installation process, particularly leveraging GPGPUs from Nvidia
Kubeflow simplifies the process of building production-ready machine learning stacks and reduces the barriers to machine learning by being easy to deploy and reusable

Natural Language Processing (NLP)

program that process and analyses large amount of natural language. It enables computers to understand text.

Numpy

is the fundamental package for scientific computing with Python.

Optical Character Recognition (OCR)

conversion of images of typed, handwritten or printed text into machine-encoded text. Widely used to read documents and convert to text

Panda

Python Data Analysis library.

Python

programming language that is tailored to data science

R

programming language primarily used for statistical analysis

Regression

A machine learning problem involving the prediction of a real-valued scalar or vector.

Scikit-learn

Open source toolkit for Python used for data mining and data analysis

Tensorflow

Open source machine learning framework providing software library for computing using data, developed by Google
TensorFlow is an open source software library for high performance numerical Computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices
It comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domain

Theano

A tensor manipulation library for Python which can run code on the GPU.

Training Set

A set of examples/observations used for training a machine learning algorithm. Means you test your model quicker before moving to the complete set of data

Source(s): Wikipedia

Google Developers, Machine Learning Glossary