5.7 Data Quality

Kaushak, Data Quality Sucks, Let’s Just Get Over It, (2006) Assume those managing the data have a level of comfort with the data; Start making decisions that you are comfortable with; over time drill deeper in micro specific areas and learn more; Get more comfortable with data and its limitations over time; Consistency in calculations=Good. 8.5 Data Visualization

Three rules to visualize insights with impact:

Highlight your message and eliminate distractions; Use visual cues to help lead your audience through your insight; Use contrast (size, color) to capture the reader’s attention. 8.6 Big Data (大数据的冲击)

8.6.1 What is big data

Trend: Analyzing non-structured data

Genernalized definition: structured data non-structured data(text,voice,vedio,GPS,sensor) Data storage/processing/analyzing techniques: Hadoop, NoSQL, machine learning/satatistics Data Scientist/Data oriented organization Decide.com/FlightCaster

Bottle neck of machine learning: store and process big data (doubt it)

From point information to thread information Point information: purchase a product, received a service…… We need thread information to answer WHY! Thread information: customer interaction data (non-structured or historical behavior) 8.6.2 Walk Between Privacy and Innovation

Collaborative Filtering (Amazon): method of making automative predictions about the interests of a user by collecting preferences or taste information from many users (collaborating). The underline assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B’s opinion on a different issue X than to have the opinion on X of a person chosen randomly.

Rapleaf: using email address/name to get personal profile (linked to information on Facebook)

8.6.3 Data Aggregator

Pay as You Drive: car insurance (provide discount according to the driving habit) For company with internal technique skill: collection, cleaning, analyzing, result delivery, implementation Data Aggregator: provide some hard to collect off-line data (what are those?) Who will be data aggregator: those who develop devices for data entry and collection More personal, more useful, harder to get Payment service companies (VISA, American Express etc.): on the road to data aggregator [VISA collabrate with GAP] Aggregate data Internal Data + External Data–>Premiun Data (multiplication effect) Example: Catalina Marketing and Nilson evaluate the TV ad by studying the association between audience rating and actual purchse What data to get? Example: iphone App: Nike+GPS, get running route (anonymous) and use the information to get insight for store location 8.6.4 Data Scientist

Sexy Characters: Communication skill Entrepreneuership: explore how to make data useful, lead to new data based services Curiosity: not limited in art, techniques, medicine, natural science……but keenly curious about different areas. By analyzing data from differnt fields, data scientist with curiosity will find bonanza 8.6.5 What machine can not do?

We have no chance of competing against machines on frequent, high-volume tasks. Machine cannot compete with us when it comes to tackling novel situations, and this puts a fundamental limit on the human tasks that machines will automate. It needs to learn from large volumes of past data. Humans have the ability to connect seemingly disparate threads to solve problems we’ve never seen before. So what does this mean for the future of work? The future state of any single job lies in the answer to a single question: To what extent is that job reducible to frequent, high-volume tasks, and to what extent does it involve tackling novel situations? On frequent, high-volume tasks, machines are getting smarter and smarter. The copy behind a marketing campaign needs to grab consumers’ attention. It has to stand out from the crowd. Business strategy means finding gaps in the market, things that nobody else is doing. It will be humans that are creating the copy behind our marketing campaigns, and it will be humans that are developing our business strategy. 8.7 Data science position

Prediction for 2017:

Continuous learning will be front and center. Learning is a way of life. Require time commitment from both individual and the company. Always absorbing what is going on there. They are naturally curious people. Companies will “train up” existing talent Predictions 1&2 lead to longer tenure Current situation is the opposite Stagnating, lack of job satisfaction HR analytics will be standard at larger firms Add headhunter support for analytics hiring Education options will thin out (a bit) Previous: MOOC/Bootcamp Separation between : money deal $15000/12 weeks v.s. high quality education Analytics ROI will be scrutinized What is the associate payback? How long does it take to see results? How much has been invested? It is important to tie analytics to real business problem Analytics findings will be scrutinized Confidence for the magic of analytics will shadowed The internet of things draws more interest Mission-driven careers are on the rise Advise: Be open to change

8.8 Everybody Lies

“Makingsense” says that there are things data can not tell us. “Everybody Lies” says there are things we cannot see other than using data and the key is where to find the right data.

condom estimates Female: 55/year, 16% using condom, 1.1 billion condoms/year Male: 1.6 billion condoms/year Reality: 600 million condoms/year sexless marriage > unhappy marriage > loveless marriage racial discrimination, People in the esat coast (心机深) significant effect (small data); small effect (big data) pancreatic cancer symptoms by searching history Weather v.s depression (Hawaii v.s Chicago, search for “depression” 40%) Gender discrination: Boys: intelligent, happy Girls: over-weight, pretty 8.9 Database design

Plan Ahead understand the data and what it’s supposed to do database modeling and software development are different Document Your Model Use a name convention Document the design New administrator can understand without having to come back to you for explanation Think Carefully About Keys: primary keys, foreign keys, artificial keys Use Integrity Checks Carefully Define integrity rules Currency for different countries, define messages as incoming or outgoing Don’t Forget Indexes in Your Design Use separate tables for domain objects rather than cramming them into a single table Define an Archiving Strategy Test Early, Test Often With analytics data, scale matters: good estimate of usage one server is never enough Rollups make things cheaper but at a great expense later: tracking every action in a raw way allows for the development of insights that tell even more important stories. User log Reliability: set up alerting and monitoring systems and make it someone else’s problem through hosted offerings. 8.10 Data Driven Marketer

Challenges: Cross-device Classic culture issues: Overcome the fear of failure Get comfortable with ambiguity Break down silos Trust data over whims, gut reactions, and even past experiences Data to include: television data, competitive data, social data Train to transform: give everyone the training they need driving marketing strategy, validating benchmarks, evaluating and providing actionable data insights clear on your data strategy (documentation) Three pillars of your integrated strategy: the right data: manage the data pipeline, put data in the context (how data was collected), act fast and don’t wait until data is perfect, the right culture: formalize points of alignment and collaboration across different functions, such as brand marketing, digital media and analytics departments marketing IT your company, not the agency owns the data better training the right technology