Building a data warehouse for big data
An enterprise data warehouse is the most appropriate environment for getting to grips with the challenge of big data, provided it is underpinned by a sound business strategy and the right architecture.
This is the view of Sean Paine, COO of information solutions specialist, EnterpriseWorx. “Big data has burst on to the radar because of the exploding volume of data that needs to be captured, managed and processed,” he says.
According to the IDC, 1.2 zettabytes (1 billion terabytes) of digital information was created worldwide in 2010. This data is expected to grow to 35 zettabytes by 2020, which means that the amount of data worldwide doubles every 30 months.
“However, it is not only the sheer volume of data that presents a challenge,” says Paine. “It is also the complexity of the data mix – structured, semi-structured and unstructured text, as well as images, voice and video. These exist in a variety of different formats – both inside and beyond the organisation and may be customer, machine or sensor-generated.”
“Organisations that create a unified data architecture across the entire business are best placed to control, exploit and gain value from large data sets. Often much of the data is conflicting or spurious and data exploration tools are needed to find the ‘needle of truth’ that delivers real business insights.”
According to Paine, over the past decade, there’s been a shift in data warehousing from historical transaction data to operational data. Next came analysis of customer behaviour data from sources such as web-page tracking and, more recently, the inclusion of unstructured data captured from social media.
“Data warehousing has geared up to cope with large volumes of data,” he says. “Analytical in-memory database software with massively parallel processing technology, such as Kognitio WX2, enables complex analytics on terabytes of data almost instantaneously. The horizontal scaling of data warehouses and storage tiering can also assist organisations to manage their data more effectively and at lower cost.
“Data warehouses are starting to address the complex relationships that exist in big data sets, for example the relationship of individuals in a social network and how they interact with each other and with different brands.”
“But it’s not just about big data. In the first instance, the enterprise data warehouse needs to be designed to support business processes. Then, with the correct technology, it will be able to cope with large, complex data volumes. Big data will flow correctly into the warehouse. To achieve this, it’s important to use newer, scalable database solutions such as Kognitio WX2 or Microsoft SQL MPP.”
Paine believes that the next few years will see the development of new analytical skills to enhance understanding of relationships between data. “We’re getting better at bringing data together, but we don’t fully understand relationships that are not apparent on the surface of a large pool of data from multiple sources. With the commoditisation of artificial intelligence and algorithms that look for patterns in the data, relational mapping will become commoditised in the data warehouse and business intelligence space.
“Organisations will start deriving value from non-text data – graphical, images, voice. An example is the software used in call centres to analyse voice tone and monitor the stress levels of the customer. If that can be picked up in real-time, it is possible to address irate calls and resolve issues.”
Paine argues that the enterprise data warehouse must expand to encompass big data analytics as part of overall information management, aligned with business objectives. “There’s no point in bringing data into the warehouse unless it can be exposed at the front end in a meaningful way through business intelligence and analytics,” he says.
“It’s vital that information technology experts empower business managers to carry out their own analysis so they can better understand their processes and their competitors. Involvement with the business community is crucial if value is to be obtained from big data.”