We thought we would start a list of machine learning terms and terminology. Its as much benefit for the team at Black Belt Digital as it is for our readers. Here we go!
A/B testing
Apache Spark
- A library for distributed computing for large-scale data manipulation and machine learning
Backpropogation
- An algorithm for training neural networks in which errors are propagated backwards through the network
Bag of words
- A representation of the words in a phrase or passage, irrespective of order. Different ways of writing the same sentence
Classification
- A machine learning problem involving the prediction of two or more classes from an observation, classifying data enables the identification of the right mathematical model to analyse data.
Clustering
- grouping data observations that are similar according to a given criteria
Confusion matrix
Data Science
- The field covering machine learning, data cleaning and preparation, and data analysis techniques such as visualisation.
Deep learning
- structures algorithms in layers to create an “artificial neural network” that can learn and make intelligent decisions on its own, sub-set of machine learning
Graphic Processing Units
- The use of graphics cards for high performance computing tasks as opposed to graphical tasks
- Due to the number of individual cores, GPUs can process far more pictures and graphical data per second than a conventional CPU
- This provides significant performance gains for tasks such as machine learning (eg; processing a large bank of images)
JupyterHub
- With JupyterHub you can create a multi-user Hub which spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server
- Project Jupyter created JupyterHub to support many users. The Hub can offer notebook servers to a class of students, a corporate data science workgroup, a scientific research project, or a high performance computing group
Kaggle
- A data science competition, great way to test potential data scientists
Keras
- Keras enables user-friendly and easy prototyping providing object oriented thinking and enabling the building of neural networks one layer at a time. In just the few lines of code you can create a sequential neural network with the standard bells and whistles like dropout.
Kubeflow
- Kubeflow helps you build composable, portable, and scalable machine learning stacks. With Kubeflow, businesses can speed up the AI tools and framework installation process, particularly leveraging GPGPUs from Nvidia
- Kubeflow simplifies the process of building production-ready machine learning stacks and reduces the barriers to machine learning by being easy to deploy and reusable
Natural Language Processing (NLP)
- program that process and analyses large amount of natural language. It enables computers to understand text.
Numpy
- is the fundamental package for scientific computing with Python.
Optical Character Recognition (OCR)
- conversion of images of typed, handwritten or printed text into machine-encoded text. Widely used to read documents and convert to text
Panda
- Python Data Analysis library.
Python
- programming language that is tailored to data science
R
- programming language primarily used for statistical analysis
Regression
- A machine learning problem involving the prediction of a real-valued scalar or vector.
Scikit-learn
- Open source toolkit for Python used for data mining and data analysis
Tensorflow
- Open source machine learning framework providing software library for computing using data, developed by Google
- TensorFlow is an open source software library for high performance numerical Computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices
- It comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domain
Theano
- A tensor manipulation library for Python which can run code on the GPU.
Training Set
- A set of examples/observations used for training a machine learning algorithm. Means you test your model quicker before moving to the complete set of data
Source(s): Wikipedia
Google Developers, Machine Learning Glossary