You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`DataGen`|`records.per.sec`|`10.0`| Records per second generated. |
43
+
|`Iceberg`|`bucket.prefix`| (mandatory) | S3 bucket and path URL prefix, starting with `s3://`. For example `s3://mybucket/iceberg`. |
44
+
|`Iceberg`|`catalog.db`|`default`| Name of the Glue Data Catalog database. |
45
+
|`Iceberg`|`catalog.table`|`prices_iceberg`| Name of the Glue Data Catalog table. |
46
+
47
+
### Running locally, in IntelliJ
48
+
49
+
You can run this example directly in IntelliJ, without any local Flink cluster or local Flink installation.
50
+
51
+
See [Running examples locally](https://github.com/nicusX/amazon-managed-service-for-apache-flink-examples/blob/main/java/running-examples-locally.md) for details.
52
+
53
+
### Checkpoints
54
+
55
+
Checkpointing must be enabled. Iceberg commits writes on checkpoint.
56
+
57
+
When running locally, the application enables checkpoints programmatically, every 30 seconds.
58
+
When deployed to Managed Service for Apache Flink, checkpointing is controlled by the application configuration.
59
+
60
+
### Sample Data Schema
61
+
62
+
The application uses a predefined schema for the stock price data with the following fields:
63
+
*`timestamp`: STRING - ISO timestamp of the record
64
+
*`symbol`: STRING - Stock symbol (e.g., AAPL, AMZN)
65
+
*`price`: FLOAT - Stock price (0-10 range)
66
+
*`volumes`: INT - Trade volumes (0-1000000 range)
67
+
68
+
### Known limitations of the Flink Iceberg sink
69
+
70
+
At the moment there are current limitations concerning Flink Iceberg integration:
71
+
* Doesn't support Iceberg Table with hidden partitioning
72
+
* Doesn't support adding columns, removing columns, renaming columns or changing columns.
73
+
74
+
---
75
+
76
+
### Known Flink issue: Hadoop library clash
77
+
78
+
When integrating Flink with Iceberg, there's a common issue affecting most Flink setups
79
+
80
+
When using Flink SQL's `CREATE CATALOG` statements, Hadoop libraries must be available on the system classpath.
81
+
However, standard Flink distributions use shaded dependencies that can create class loading conflicts with Hadoop's
82
+
expectations.
83
+
Flink default classloading, when running in Application mode, prevents from using some Hadoop classes even if
84
+
included in the application uber-jar.
85
+
86
+
#### Solution
87
+
88
+
This example shows a simple workaround to prevent the Hadoop class clashing:
89
+
1. Include a modified version of the Flink class `org.apache.flink.runtime.util.HadoopUtils`
90
+
2. Use Maven Shade Plugin to prevent class conflicts
91
+
92
+
The modified [`org.apache.flink.runtime.util.HadoopUtils`](src/main/java/org/apache/flink/runtime/util/HadoopUtils.java)
93
+
class is included in the source code of this project. You can include it as-is in your project, using the same package name.
94
+
95
+
The shading is configured in the [`pom.xml`](pom.xml). In your project you can copy the `<relocations>...</relocations>` configuration
0 commit comments