From the course: High-Performance PySpark: Advanced Strategies for Optimal Data Processing
Unlock this course with a free trial
Join today to access over 25,500 courses taught by industry experts.
Avro schema evolution: Managing changes in data structures
From the course: High-Performance PySpark: Advanced Strategies for Optimal Data Processing
Avro schema evolution: Managing changes in data structures
- [Instructor] Another standout feature of Avro is its support for schema evolution. In distributed systems, where data models evolve over time, this is crucial. Schema evolution allows applications to handle data serialized with different versions of the schema without breaking compatibility. Let's dive in to see how this works, and why it's so powerful. Schema evolution allows applications to handle data serialized with different versions of the schema without breaking compatibility. Let's dive in to see how this works, and why it's so powerful. Schema evolution is the ability to modify your data model, the schema, over time while ensuring that old applications can still read new data, which is basically a backward compatibility, and new applications can still read old data, which is nothing but forward compatibility. This is especially critical in distributed systems like Kafka where producers and consumers may be using different versions of the schema, but there are few rules for…
Contents
-
-
-
-
-
-
(Locked)
Introduction to data formats: Understanding JSON and CSV2m 30s
-
(Locked)
Exploring JSON2m 43s
-
(Locked)
Exploring Avro2m 33s
-
(Locked)
How Avro handles serialization and deserialization1m 52s
-
(Locked)
Avro schema evolution: Managing changes in data structures2m 41s
-
(Locked)
Avro pros and cons1m 17s
-
(Locked)
Understanding ORC: Optimized row columnar storage2m 6s
-
(Locked)
ORC pros and cons2m 17s
-
Parquet: The go-to columnar format for high-performance analytics5m 57s
-
(Locked)
Compression algorithms in Spark: Comparing Zstd, Snappy, and LZ45m 55s
-
(Locked)
-