Big data is different from past data warehousing efforts because it performs analytics on almost any type of data file or format, including images, videos, and data gathered from social media. Another characteristic of big data is that it does not have the “one to one” relationship of server to data storage, but relies on virtualization architecture, needed to be able to draw from large content stores and archives as a single global resource.

Among corporate executives and line-of-business managers, the compelling motivation in using big data is to formulate more accurate, detailed forecasts or predictions that can offer highly probable advantages to an organization. Examples of business benefits cover a wide spectrum ranging from new product development and enhancements to optimum pricing, to screening job applications and designing effective marketing campaigns. In fact, political campaigns have already entered into big data analytics: The Obama 2012 campaign utilized big data analytics to identify likely voters and not only influence them, but also zero in on them in an effort to raise campaign funds and get out the vote, ultimately a key strategy in its victory.

Big Data Privacy Concerns

The FTC’s recent action is specific to data brokers: companies that collect and analyze specific consumer behavioral data and then sell the results to other companies looking to improve their consumer marketing and sales efforts. However, it is important to acknowledge that growing privacy concerns about the use of big data are not limited to these conventional data brokers. The Economist Intelligence Unit, an independent business within the Economist Group, has published a study of leaders in the use of big data that spanned 19 industry sectors including manufacturing, IT and technology, financial services, professional services, healthcare, pharmaceuticals and biotechnology, and consumer goods. There can be no doubt that the big data revolution has begun.

In light of the characteristics of big data, and the business motivators to pursue its use, one of the most critical privacy aspects is, simply, the quality or accuracy of the data; and how an enterprise uses it might, negatively, affect an individual in decisions that are made. For example, how accurate is personal information obtained from social media? Should information from social media or other Web-enabled sources be used to screen or rank job applications, or increase the price of medical insurance? Basic profile data, such as age, marital status, education or employment, is typically unverified. A similar lack of verification is common in free email services in which the account holder, by accepting terms and conditions, has agreed to relinquish some degree of privacy for data aggregation purposes.

Another quality issue is the way that Internet search terms or phrases can be misinterpreted, when this type of data is collected. Examples of poor enterprise use of big data would include using Internet search terms to evaluate product pricing or, perhaps, target potential customers. There can be multiple users on a household computer, and there are many reasons why someone might research a subject on the Internet that is not

directly relevant to them. This type of data collection, analysis and usage can result in flawed analytic results leading to bad decisions, a lose-lose scenario for both individuals and the organizations acting upon that data. This lack of big data quality control points to another well-established privacy principle, which is to collect personal data that is consistent and appropriate for the intended purpose.

Best Practices for Big Data Privacy

Enterprise best practices for working with big data are still emerging, but there are already lessons that can help move this promise of innovation forward without sacrificing the privacy of personal data.

The first step in effective use of big data is to become highly competent in procuring and managing cloud services, which are considered a prerequisite for big data to be cost effective: most organizations can’t or won’t make the IT infrastructure investments necessary to support a big data initiative, and instead rely on cloud-based applications, infrastructure and processing power. Further, even those willing to make the commitment will find it difficult to proceed without the added flexibility the cloud provides. Yet this represents a weak spot for many organizations in that the competency required to ensure the security and privacy of data in the cloud is generally lacking. It is not enough to implement standard general security contractual clauses. There must be well-defined responsibilities for both the cloud services provider and the cloud services user regarding specific data privacy controls that are required. There must also be ongoing monitoring and audits of cloud services along with any relevant metrics that indicate levels of data integrity, confidentiality and availability. An excellent data protection resource for using cloud computing services is the Cloud Security Alliance, which publishes guidance documents and makes them available on its website.

Experience to date indicates that ideally, in deployment of cloud services, it is best to perform big data prototyping on a public cloud and then move to a private cloud. Why? A public cloud deployment, by definition, is with a third-party and may be accessed by “untrusted” parties. Private cloud deployments are directly controlled and managed by an organization or enterprise even though data computing facilities may be located off-premises; private cloud deployments can be accessed only by trusted parties.

The next tactic to enable better use of big data is to implement converged storage. Converged storage is more efficient and will reduce the likelihood of errors that influence data quality or accuracy. A critical characteristic of converged storage that relates to data quality and accuracy is data de-duplication, although it has cost efficiency benefits as well.

Another best practice is to properly sanitize data, as it helps avoid a number of the aforementioned privacy issues. “Apply filtering, cleansing, pruning, conforming, matching, joining, and diagnosing at the earliest touch points possible,” said Amy Dean, a data warehouse specialist with Emory University in Atlanta. Dean recommends that varied and disparate data sources can be weighted or scored in terms of data quality to factor into the analytics. [7] Dean also suggests that the data sources need to be linked or available for reference so any data element in question can be traced back to its source.

 

Summary

Ultimately, the best safety net for accuracy of personal data (and in turn enable better data privacy practices) is to encourage and invite, not just provision, a process for consumers to access, review and correct information that has been collected about them. Further, the consumer review process needs to be easy to use and at no cost to the consumer. This is daunting to many early adopters of big data because they often collect large volumes of data they never even use. There may be a fear of letting consumers see just how much detailed personal data has been collected about them, but this level of transparency is the best way to achieve consumer trust and confidence in decisions being made using big data. Credit reporting entities have long made consumer data access, review and correction procedures a long-standing practice, and it is a U.S. regulatory requirement for that industry. Similarly, privacy notices, statements or disclosures on websites, which include contact details for questions or concerns, is another best practice to better enable transparency and a way to address incorrect data.

VizTeams has over 300 experts with the history of successfuly delivering over 500 projects. VizTeams serves cllient inside North America specifically USA and Canada while physically serving clients in the cities of Seattle, Toronto, Buffalo, Ottawa, Monreal, London, Kitchener, Windsor, Detroit. Feel free to contact us or Drop us a note for any help or assistance.

References

++ “Big Data: Lessons from the Leaders”, Economist Intelligence Unit, August 2012
++ OECD Privacy Principles: http://oecdprivacy.org
++ http://www.cloudsecurityalliance.org, see Security Guide and Cloud Controls Matrix documents
++ “Newly Emerging Best Practices for Big Data”, a Kimball Group Whitepaper by Ralph Kimball, Nov. 12, 2012
++ NIST Special Publication 800-145: “The NIST Definition of Cloud Computing”
++ “(IDG) Converged Storage: A Next Gen Storage Strategy for Big Data”, Hewlett-Packard Co., Nov. 12, 2012
++ “Newly Emerging Best Practices for Big Data”, a Kimball Group Whitepaper by Ralph Kimball, Nov. 12, 2012

 

Drop Us A Note

FacebookTwitterLinkedinMore...