Don’t Jump in the Data Lake

32. 47. 19. 7. 85.

Congratulations! I just gave you five very important, valuable numbers. Or did I?

If they were tomorrow’s winning Powerball numbers, then certainly. But maybe they’re monthly income numbers. Or sports scores. Or temperatures. Who knows?

data-lake-1Such is the problem of context. Without the appropriate context, data are inherently worthless. Separate data from their metadata, and you’ve just killed the Golden Data Goose.

If we scale up this example, we shine the light on the core challenge of data lakes. There are a few common definitions of data lake, but perhaps the most straightforward is a large object-based storage repository that holds data in its native format until it is needed or perhaps a massive, easily accessible, centralized repository of large volumes of structured and unstructured data.

True, there may be metadata in a data lake, thrown in along with the data they describe – but there is no commonality among such metadata, and furthermore, the context of the information in the lake is likely to be lost, just as a bucket of water poured into a real lake loses its identity.

Read the entire article at http://www.nuodb.com/blog/dont-jump-data-lake.

SHARE THIS:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.