Warehouses provide consolidation, validation, standardization, integration, distribution and presentation of data. That's a vast amount of functionality - but since they're a centralized resource they can also become a bottleneck for the enterprise if managed poorly. The two primary challenges include:
- Data structures may be difficult to change - changes to significant business rules (sale regions for example) may require conversion of vast amounts of historical data. A change of this type can get deferred for years.
- ETL interfaces may be slow to add - assuming that each ETL interface takes 1-2 months, a six to twelve month backlog can easily develop due to a loss in staffing, sudden surge in needed interfaces, etc.
- Very little data actually has common keys: most systems that serve as a source of record are developed without much thought about strategic data sharing, are developed independently, may be picked up through an acquisition. So, accounts, customers, sites, products, services, users, departments, assets, tickets, etc - are likely to have completely different ids in different systems.
- Business rules are inconsistent across systems: definitions, formats, and rules may be different across systems for either legitimate or incompetent reasons. A legitimate example is the different in the definition of a customer: sales may define a customer as someone as a prospect or someone who has placed an order, finance may define a customer as someone for whom payment has been received, delivery/operations/support may define a customer as someone for whom a service is currently being provided. Attempting to consolidate all orgs into a single definition generally fails. Incompetent reasons can be simply a matter of poor communication between departments, and are the rule rather than the exception in large organizations.
- Operational systems are not designed for analytical queries - most operational systems are designed to handle some number of very small transactions very quickly. They aren't designed to handle vast queries, and especially not large numbers of them. Performance is usually poor at best, but can easily be far worse than poor and put the availability of the operational process at risk.
- Operational systems seldom keep historical data - few operational systems keep full historical data due to the impacts to performance, capacity, and development time. However, one of the most common forms of analysis is time-series analysis.
- Operational systems seldom include vital reference data necessary for powerful analysis - extended attributes that can augment system data is often essential for powerful analysis. However, this also needs to be acquired and staged somewhere.
- The fast development capabilities of mashups can compensate for the slow development speed of warehouses.
- Mashups can combine real-time data from operational systems with high-latency data from the warehouse. Typical data warehouse reporting tools (OLAP, ROLAP, etc) aren't very effective at this without database federation - which can be problematic with complex queries or large data volumes.
- The rich data functionality of warehouses can compensate for the weak back-end functionality of mashups and their operational sources.
No comments:
Post a Comment