6.5 Data science team

It is an article for data science in startups but we can still learn some general points.

  • What can data science do?
    1. Improving the products:
      • Rely on a virtuous cycle where products collect usage data that becomes the fodder for algorithms which in turn offer users a better experience.
      • The first version of your product has to address what data science calls the “cold start” problem — it has to provide a “good enough” experience to initiate the virtuous cycle of data collection and data-driven improvement. It needs product managers and engineers to implement solution. (需要至少开始有人使用产品,进入“数据收集-数据驱动改进产品”的循环) Data scientists collaborate with engineers Decide whether data scientists implement product enhancements themselves or partner with engineers who implement them. Either approach can work, but it’s important to formalize it and establish shared expectations across the organization. Otherwise, you’ll struggle to get improvements into production, and you’ll lose talented data scientists who feel unproductive and undervalued. Skills: machine learning knowledge and production-level engineering skills Improving the decisions: Decision science uses data analysis and visualization to inform business and product decisions. The decision-maker may be anywhere in the organization — from a product manager determining how to set priorities on a road map to the executive team making bet-the-company strategic decisions. Characteristics of decision science: Subjective, requiring data scientists to deal with unknown variables and missing context Complex, with many moving parts that lack clear causal relationships Decision science problems are measurable and impactful — the result of making the decision is concrete and significant for the business[Hui: 文章的作者还提到一点说是公司之前不需要解决的新问题。个人不同意这一点。作者这么说可能因为这篇文章是针对创业企业。对于大型传统企业,遇到的大部分都是很老的决策问题,只是之前只是靠猜而已……额,说好听些就是行业专家的猜测。] Sound like data analytics but decision science should do more than produce reports and dashboards. Data scientists shouldn’t be doing work that can be delivered using off-the-shelf business intelligence tools. Not always need decision science (In the two cases, businesses need to rely on intuition and experimentation): Some decisions are too small to justify the investment Lack the data to meaningfully analyze them Good decision scientists know their own limitations Skills: business and product sense, systematic thinking, and strong communication skills Should you be investing in data science? Invest: it’ll be critical to your success Not invest: it’ll just be an expensive distraction Are you committed to using data science to either inform strategic decisions or build data products? If you’re not committed to using data science toward one of these goals, then don’t hire data scientists. Culture of data-driven decision making is necessary You may not need them on day one, but it takes time for you to hire the right people — and time for them to get to know your data and your business. You’ll need all that to happen before they can apply data science to drive decision making. Data products can create value and delight users through improved optimization, relevance, etc. If these are on your product roadmap, you should bring data scientists in early to make the design decisions that will set you up for long-term success. Data scientists can make key decisions about product design, data collection, and systems architecture that are critical foundations for building magical-seeming products. Will you be able to collect the data you need and act on it? Data science requires data: quantity and quality Data science only matters if data drives action. Data should inform product changes and drive the organization’s key performance indicators (KPIs). Instrumentation requires a commitment across the organization to identify what data each product needs to collect and establish the infrastructure and processes for collecting and maintaining that data. To be successful, instrumentation requires collaboration among data scientists, engineers, and product managers — which in turn requires executive commitment. Data-driven decision making requires a top-down commitment. From the CEO down, the organization has to commit to making decisions using data, rather than based on the highest paid person’s opinion. Do you need data science to be a core competency, or can you outsource it? If data science is solving problems that are critical to your success, then you can’t afford to outsource it. Also, off-the-shelf solutions tend to be rigid. If your business is taking a unique approach to a problem (e.g. collecting new kinds of data or using the results in novel ways), it’s unlikely that an off-the-shelf solution will be flexible enough to adapt to it. Where does data science belong in your organization? A standalone team -Your data science team acts as an autonomous unit parallel to engineering. The head of data science is a key leader and typically reports to the head of product or engineering — or even directly to the CEO. Pros: Autonomy Well positioned to tackle whatever problems it deems most valuable Demonstrates that the company sees data as a first-class asset, which will help them attract world-class talent. Works particularly well for decision science teams. Even though decision scientists collaborate closely with product teams, their independence helps them to make hard calls, like telling PMs that their product’s metrics aren’t good enough to justify a launch. Decision scientists also benefit a lot from cross-pollination, both to understand how different product metrics depend on one another and to share more general learnings about experimentation and data analysis. Cons: Risk of marginalization. As companies grow and organize into product teams, they often prefer to be self-sufficient. Even when they could benefit from collaboration with data scientists, product teams simply don’t want to depend on resources they don’t control. Instead, they rely on themselves — even hiring their own data scientists under other names like “research engineers” — to get things done. If product teams refuse to work with the standalone data science team, then that team becomes marginalized and ineffective. Again, that’s when you start losing good talent. An embedded model Data science team brings in talented people and farms them out to the rest of the company. There’s still a head of data science, but he or she is mostly a hiring manager and coach. The embedded model is the polar opposite of the standalone model: It gives up autonomy to ensure utility. Pros: data scientists join the product teams that most need their services, and get to work on a wide variety of problems throughout the organization. Cons: not all data scientists are happy giving up autonomy (in fact, many are not good at it at all). Data scientist job descriptions emphasize creativity and initiative, and embedded roles often require them to defer to the leadership of the teams in which they are embedded. Data scientists will feel like second-class citizens as embedded team members — their product leads don’t feel responsible for their growth and happiness, while their managers won’t feel directly vested in their work. We’ve seen some companies embed data science managers, but this approach only works once you have a fairly large data science team. Integrated team No separate data science team. Instead, product teams hire and manage their own data scientists. Pros: This optimizes for organizational alignment. By making data scientists first-class members of their product teams, it addresses the downsides of the standalone and embedded models. To the extent that data scientists, software engineers, designers, and product managers work on shared product goals, the integrated model instills collective team ownership of those goals. This is how you avoid the breakdowns that can occur when narrowly focused functional teams diverge in their goals and end up mired in dependencies that are too often ignored or delayed. Cons: It dilutes the identity of data science. Individual data scientists identify with their associated product teams, rather than a centralized data science team. You also sacrifice the flexibility of the embedded model, since it’s harder to move people around based on their skills and interests. he integrated model can create challenges for scientists’ career growth, since the manager of an integrated team may not be in the best position to value or reward their accomplishments. Over time, the impact that a data science team has will be far higher if you build a diverse team with extremely different backgrounds, skill-sets, and world views.

This will ensure they think as holistically as possible about their domain, and will encourage creativity and innovation over time.