Flink 学习笔记十五

2017/12/09 Flink

Flink 学习笔记第十五

FlinkML 是一个Flink支持的算法库,用来解决现实生活中的问题的。算法库设计之初就是为了分布式计算考虑的。FlinkML 设计的目标是让ML 开发者写尽可能少的代码,同时致力于让算法使用起来简单。Flink支持原生的数据流和数据迭代运算。FlinkML由scikit-learn和Spark`s MLlib支持。允许用户定义数据处理流程并且解决机器学习问题。 下述清单是Flink计划要支持的:

  • Pipelines of transformers and learners
  • Data pre-processing
    • Feature scaling
    • Polynomial feature base mapper
    • Feature hashing
    • Feature extraction for text
    • Dimensionality reduction
  • Model selection and performance evaluation:
    • Model evaluation using a variety of scoring functions
    • Cross-validation for model selection and evaluation
    • Hyper-parameter optimization
  • Supervised learning
    • Optimization framework
    • Stochastic Gradient Descent
    • L-BFGS
    • Generalized Linear Models
    • Multiple Linear Models
    • Multiple linear regression
    • LASSO , Ridge regression
    • Random forests
    • Support Vector Machines
    • Decision trees
  • Unsupervised learning
    • Clustering
    • K-means clustering
    • Principal Components Analysis
  • Reconmmendation
    • ALS
  • Text analytics
    • LDA
  • Statical estimation tools
  • Distributed linear algebra
  • Streaming ML
Show Disqus Comments

Search

    Table of Contents