With data science pervading all areas of society, the importance of technical data science work is increasing in organisations across sectors – and with it, the importance of technically sound data science reporting.

As data science continues to affect all parts of society, finding agreed and reliable ways of data science reporting will become crucial.

This article summarises some of the best practice advice to for anyone who plans to author, or critically read, professional data science reports, such as real-world practitioners or participants of the Turing Data Study Groups. I would also like to encourage professionals to participate in the ongoing open discussion around, or writing of, the Turing’s best practice guidelines.

What is a data scientific report?

Traditionally, professional reports occupy a grey area, in-between academic scientific discourse and scientific journalism. The field of science studies (i.e., research about research), categorises them as technical reports, part of so-called grey literature.

A stylised data scientific report is characterised by:

  1. The topic. A data scientific report usually addresses a domain question or domain challenge in an application domain specific to a dataset (e.g., financial, medical, social).
  2. The aim. The primary goal of the analyses is data-driven answers to some domain question.
  3. The audience. The readers are decision-makers or domain experts interested in hard evidence to inform decision-making.

There are natural differences, as well as natural interfaces to academic research. In a report, an appealing narrative or academic novelty is less important than a correct, data-driven answer. Such an answer may be achieved by rote methodology, or may even take the form of a solid negative result – difficult to publish in a stylised academic outlet, but possibly still of high value to decision makers.
On the other hand, data science reports may highlight gaps in the state-of-art, and act as a motivator for domain application informed academic research.

Five principles for good data scientific reporting

There is much intersection of best reporting practice with best research practice, both being a matter of intellectual honesty and due diligence. I would like to highlight some common pitfalls that are specific to the data scientific reporting setting, and best practice principles to avoid them. An author may treat these as a mental checklist, while a reader may find them useful for spotting tell-tale signs of common bad practice.

1. Correctness and veracity

While everyone will agree that fabrication of results is bad, it is rare. More common problems are wishful thinking or over-marketing.

How-not-to: the report is marketing material for a particular approach, algorithm, or method. It praises its advantages while avoiding talk of shortcomings.

How-to: before presenting a solution, the report describes what the data scientific problem is, and provides an unbiased overview of potential approaches.

To consider: Solid data scientific work is much more difficult than lying convincingly. Therefore, truthfulness is a competent data scientist’s unique selling point.

 

2. Clarity in writing and argumentation

Any recommendation is just as good as the argumentation that supports it. In data science, the argumentative chain is a sequence of empirical, logical, and statistical arguments. Key parts are of empirical nature: e.g., how does the approach address the real world problem? How are the real world recommendations obtained from the analyses? Understanding these connections requires no deep methodological background knowledge, only careful bookkeeping. Moreover:

  • Every analysis is done for a reason that needs be reported – even if that reason is simply “it is the most common approach for this type of problem”.
  • “The AI said so” is not an argument. Algorithms can be wrong, too.

To consider: The only way to check correctness of a conclusion is checking correctness of the argumentative chain. An argument that is not made is just as good as an argument that is wrong.

 

3. Reproducibility and transparency

If it can’t be checked, it can’t be trusted. Basic scientific quality control requires that the work can be fully scrutinised – the burden of proof lies with the originator of a claim. Even if the work is not open to the public, a reviewer should be able to check data, code for methods and analyses, and the full argumentative chain building on top of it.

Specialised technology may facilitate reproducibility, e.g., sub-version repositories and data management architecture, see The Turing Way. Of course, technology does not replace solid quality control processes.

To consider: It’s called quality control and not quality trust.

 

4. Scientific method and scientific process

A requirement for the scientific method is that all claims can be put to test, and the scientific process requires to do so on a regular basis. Deviation from the method, or the process, are common signs of pseudo-data-science.

Of all principles this is perhaps the most subtle and difficult one: a report can be fully reproducible and argumentation complete, but may still be scientifically broken, e.g., at the level of success control. In general, ensuring quality here requires expertise in data science as well as in the application domain. Though the more common issues can be caught just by carefully working through the question “how would a negative finding look like”. Examples:

  • Visualisation is not a scientific argument. Reading the entrails of a pretty picture is modern-day tasseography. Ask “what would it show there was nothing to find”.
  • Correlation isn’t causation. More difficult to spot in new terminology, e.g. “performant features are(n’t) root cause drivers”. As a rule of thumb (to which there are exceptions), only data from an intervention trial can establish causal links cleanly.
  • An observed control condition (e.g., control group) is required to establish association. If one wishes to argue that there is a link between A and B, one needs data that also contains examples of non-A and non-B (or less-A and less-B).

Regarding the scientific process: related fields need to be explicitly identified and referenced, in the domain (e.g. cancer treatment), and in data science (e.g. survival modelling). Common approaches there should be tried first. It is best to avoid untested experimental methodology until it has been separately subjected to (academic or internal) research quality control.

To consider: Any sufficiently untestable claim is indistinguishable from wishful thinking. An exciting real world application does not justify lack of scientific due diligence.

 

5. Awareness of application context and consequences

Ethical and technical appraisal of results go hand in hand: it is often the same mechanism that may cause damage to society and to the end user – recklessness and carelessness are more often the problem than malice. As a matter of professional integrity and technical accuracy, the report needs to spell out concerns and limitations very explicitly, e.g. in executive summary.

Common failure points involve:

  • Discrepancy between development and deployment environment. Stylised ML and AI algorithms notoriously fail when operating outside their training data range. Problematic for control systems such as in vehicles, aeroplanes; on the financial market; in AI policing; or in medical decision making.
  • Discrepancy between mathematical assumptions and application case. A provable guarantee is worthless if the real world does not satisfy the proof’s premises.
  • Discrepancy between intended use and actual use. Social media are nice for exchanging ideas and staying in touch, but also useful to manipulate the masses.

In addition to appraisal by the author, most organisations will have a dedicated body for ethical checks – though currently only few have one for technical validity checks.

To consider: consequences need not be the same as intended consequences. Positive thinking does not replace careful ethical and technical appraisal.