导图社区 flink

flink

Flink核心计算框架是Flink Runtime执行引擎,是一个分布式系统,能够接受数据流程序并在一台或多台机器上以容错方式执行。 Flink Runtime执行引擎可以作为Yarn的应用程序在集群上运

编辑于2022-09-26 11:49:13 浙江省

flink

EDXljCoL

他的近期作品查看更多>>

flink
Flink核心计算框架是Flink Runtime执行引擎,是一个分布式系统,能够接受数据流程序并在一台或多台机器上以容错方式执行。 Flink Runtime执行引擎可以作为Yarn的应用程序在集群上运

flink

社区模板帮助中心，点此进入>>

EDXljCoL

他的近期作品查看更多>>

flink
Flink核心计算框架是Flink Runtime执行引擎,是一个分布式系统,能够接受数据流程序并在一台或多台机器上以容错方式执行。 Flink Runtime执行引擎可以作为Yarn的应用程序在集群上运

相似推荐
大纲

互联网9大思维
- 34.5k
- 939
- 2.4k
- 395
MindMaster
组织架构-单商户商城webAPP 思维导图。
- 14.8k
- 3
- 185
- 9
Kacyun
域控上线
- 1.6k
- 164
- 11
- 4
jackrao
python思维导图
- 5.4k
- 535
- 242
- 7
(*^▽^*)
css
- 1.2k
- 1
- 43
- 3
A张舫
CSS
- 3.3k
- 263
- 188
- 33
journey
计算机操作系统思维导图
- 4.3k
- 342
- 204
- 18
journey
计算机组成原理
- 1.5k
- 98
- 70
- 8
journey
IMX6UL(A7)
- 525
- 41
- 5
- 0
Handler XU
考试学情分析系统
- 698
- 51
- 10
- 1
蒋龙

flink

what?

架构设计

综述

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.

流模型

无边界流

有边界流(批)

资源管理器部署

Hadoop Yarn

Kubernetes

standalone

可以任意扩展数据，处理节点和数据处理任务规模

计算状态保留在内存中，定期异步缓存checkpoint，保证消息只被处理一次

子主题

应用

流式应用

流

按有无边界分

无边界流

有边界流

按实时性分

实时流

回放流

状态

按应用类型划分

无状态应用

有状态应用

支持特性

多种状态类型

可插拔的状态存储后端

提供精确的消息一次处理语义一致性

支持单个极大状态和分布式的极多状态数量

时间

事件时间模式

追求数据处理的精确性，能够容忍一定的延迟

Watermark（水位）支持

迟到数据处理

处理时间模式

追求数据处理的实时性，可以容忍一定的误差

分层设计的API

综述

子主题

处理函数层

使用场景：事件驱动应用程序

示例

/** * Matches keyed START and END events and computes the difference between * both elements' timestamps. The first String field is the key attribute, * the second String attribute marks START and END events. */ public static class StartEndDuration extends KeyedProcessFunction<String, Tuple2<String, String>, Tuple2<String, Long>> { private ValueState<Long> startTime; @Override public void open(Configuration conf) { // obtain state handle startTime = getRuntimeContext() .getState(new ValueStateDescriptor<Long>("startTime", Long.class)); } /** Called for each processed event. */ @Override public void processElement( Tuple2<String, String> in, Context ctx, Collector<Tuple2<String, Long>> out) throws Exception { switch (in.f1) { case "START": // set the start time if we receive a start event. startTime.update(ctx.timestamp()); // register a timer in four hours from the start event. ctx.timerService() .registerEventTimeTimer(ctx.timestamp() + 4 * 60 * 60 * 1000); break; case "END": // emit the duration between start and end event Long sTime = startTime.value(); if (sTime != null) { out.collect(Tuple2.of(in.f0, ctx.timestamp() - sTime)); // clear the state startTime.clear(); } default: // do nothing } } /** Called when a timer fires. */ @Override public void onTimer( long timestamp, OnTimerContext ctx, Collector<Tuple2<String, Long>> out) { // Timeout interval exceeded. Cleaning up the state. startTime.clear(); } }

数据流层

使用场景：数据流加工程序

示例

// a stream of website clicks DataStream<Click> clicks = ... DataStream<Tuple2<String, Long>> result = clicks // project clicks to userId and add a 1 for counting .map( // define function by implementing the MapFunction interface. new MapFunction<Click, Tuple2<String, Long>>() { @Override public Tuple2<String, Long> map(Click click) { return Tuple2.of(click.userId, 1L); } }) // key by userId (field 0) .keyBy(0) // define session window with 30 minute gap .window(EventTimeSessionWindows.withGap(Time.minutes(30L))) // count clicks per session. Define function as lambda function. .reduce((a, b) -> Tuple2.of(a.f0, a.f1 + b.f1));

SQL/表层

使用场景：数据分析和ETL

示例

SELECT userId, COUNT(*) FROM clicks GROUP BY SESSION(clicktime, INTERVAL '30' MINUTE), userId

库

Complex Event Processing (CEP)

Dataset API

Gelly(图算法库)

操作

保证应用24小时不间断运行

一致性检查点checkpoint

异步增量式检查点机制保证高效低迟延

端到端的消息一次处理语义保证（数据一致性）

基于zookeeper提供HA解决方案（无单点故障问题）

有状态的流式应用程序的更新，迁移，暂停和恢复

通过savepoint机制来保证数据状态一致性

应用演化升级

flink集群迁移

flink版本升级

应用程序扩容

A/B测试不同应用程序版本质量和效果

简单的暂停和恢复执行

归档（不同时间节点的版本，以备恢复）

监控和管理

Web UI

日志

统计信息

REST API

How?

Why?