The views expressed in articles published on FIRES do not necessarily reflect those of IES or represent endorsement by the IES.
By Douglas Steel, PhD
Founder and Chief Scientific Officer of NeuroSense
These are transformational times for the lighting industry. The cost of LED-based products has dropped dramatically. At the same time, increased sophistication and capabilities of tunable LED arrays, controls, and sensors now enable the commissioning of platforms that can precisely control light intensity, correlated color temperature, and relative spectral content. With this capability, we have entered a stage at which lighting can now be used not just for illumination, but to provide beneficial health effects. Supporting this is a new vocabulary of terms such as “human-centric lighting,” “bio-centric lighting,” “lighting for people” and others. However, few standards exist that provide guidance as to how lights should be controlled so as to confer benefits.1 
The academic research community is actively working on expanding our knowledge of the health effects of light, and there is a large and growing body of published research papers that are continuously coming out. Reading, understanding, and integrating new results into our larger body of knowledge is a never-ending process. Taking this knowledge and applying it in practical applications is referred to as “translational science,”2  and in the interests of proper disclosure, this is the domain in which I practice. I will not be endorsing any specific products, technologies, or uses of light exposure.
Translational science requires reading a large number of research publications from many different journals and then citing relevant papers. It is essential that anyone who is making a scientific claim support that claim by citing previously published scientific data in the literature. One of significant challenges in the emergence of human-centric lighting as a practice is that significant claims are being made either without supporting evidence, or by citing evidence that has been misinterpreted, is flawed, or is not relevant to the claims being made.3  Persons trained in the sciences have (hopefully) received instruction in how to read and evaluate scientific papers, so within the field, scientists tend to “police” the shared understanding of a field of knowledge. In fact, within the scientific community there is a reputational “common knowledge” of who is and is not a credible source of new information. However, this understanding often doesn’t make it outside of the circle of scientists, with the result that the casual reader has no idea how to read and judge a scientific paper. It isn’t sufficient to simply read an abstract or the conclusions and take them as fact; in many cases a paper will be irrelevant to a claim being made, or be “fatally flawed” from a methodological error that negates all or part of the study.4 
My recommendation is that anyone reading a scientific paper have a “scientist buddy” with whom to share a paper and ask them for a “second opinion.” Far from appearing uncertain, this is a common practice that reinforces a commitment to best available knowledge.
I’d like to share some of the most common insights that scientists utilize when evaluating the quality and relevance of a scientific paper. I’d like to invite readers to submit comments about these, as well as their own suggestions for other criteria.
- Is the research hypothesis-driven or phenomenological?5  Most scientific experiments are structured so as to answer a specific question, called a hypothesis (“I have an idea that X causes Y”). The study is then conducted, the results are collected and analyzed, and the conclusions then state whether the results supported the hypothesis, refuted them, were inconclusive, or provide something altogether different. The phenomenological format of study is more like a survey: If we do X, what happens? This is then examined under many different conditions. A phenomenological study doesn’t prove anything; it is a set of observations without a preconceived idea of what the outcome will be. They are often disparagingly referred to as “fishing expeditions.” When reading a paper, ask yourself “Did the investigator have a starting hypothesis, and did the experiment definitively prove or disprove the hypothesis?” Generally, phenomenological papers are not cited as evidence of a scientific claim because they didn’t test anything.
- Was the study done in a test tube or a living, intact organism?6  Every experiment has variability, and statistical tests tell us whether an effect was significant or not. Test tube studies tend to have much less variability in results than those done in an organism because there are fewer variables and “things going on.” However, it is hard to say with certainty that what happens in a test tube is an accurate predictor of what happens in an animal. So test tube studies are strongly indicative, but may or may not be relevant. In the case of circadian-related studies done in animals, it is important to note whether the animals are nocturnal or diurnal. In some cases since their response to light is the opposite of humans’, results may not be applicable.
- A special comment about optogenetic studies7 . In recent years a new methodology has been developed in which modified genes are inserted into an organism. In optogenetic studies, these modifications make the gene sensitive to light, and exposure will selectively turn the gene off or on. These are brilliant experiments, but it is not valid to claim that the experiment proves that light has an effect on any organism that was not given the genetic modification. This is a special category of experiment in which light is being used a research tool.
- What was the sample size in the experiment?8  A biological response must occur to a sufficient magnitude and in a sufficient number of test subjects in order to provide a statistically significant result. If the effect being studied is small, or if there is a lot of variability between subjects, then the sample size (total number of subjects) must be higher (e.g., pharmaceutical human trials typically involve tens of thousands of patients). Despite this fact, some researchers persist in using a small number of subjects relative to the thing being studied. The result is statistically weak results – which leaves open the question of whether the effect being studied is in itself weak, or whether the inclusion of more subjects would have made the results more statistically significant.
- Did the experiment include both positive and negative control groups?9  A control group is a set of subjects or conditions that tests whether an experimental effect is unique and specific to one particular manipulation – or not. This can be illustrated through an example. Let’s say I make a claim that red light is capable of growing hair. I test red light exposure and see hair growth. As a positive control I instead apply Rogaine®. If I see hair grow, then I know that in my experiment, hair growth is capable of being stimulated. I next test blue light exposure and I see no hair growth; this is a negative control, in that under the same conditions there is a circumstance in which hair does not grow. Therefore red light is confirmed as having a specific effect. In all good experiments – including those using light exposure – there should be both positive and negative control groups. Without this, there is a risk that a particular exposure is due to a general light effect, and not to a particular wavelength in specific. The idea of what constitutes a proper control, and if that has been adequately tested, may take the skills of a trained scientist to determine if the proper test controls were included in a study.
- Was there within-subject or cross-subject design?10  In a cross-subject experiment, if there are two distinct treatment groups (one control and one receiving the treatment), each group receives just the single treatment. This increases the risk that an effect is due to something else that is unique to one group or the other. It also increases the variability of the experiment. A preferred design is within-subject, in which each person or animal receives the treatment condition at one point in time and the control condition at another time. Then statistical analysis can use each subject as their own control (thus, within-subject).
- Was it a placebo-controlled and/or double-blind study?11 . Although it is not always possible, in human studies, in particular, it is preferable to use a placebo (inactive) treatment as a control for an experimental treatment. In this way, we can be sure that a person’s knowledge that they are receiving a particular treatment isn’t changing their response. While this is easy to do with pills, it is very difficult to do in experiments with light since the subject can see what they are exposed to. I’d be interested to hear how this is addressed by those involved in clinical studies involving light exposure. In double-blind experiments, neither the subject nor the investigator knows whether a subject is receiving a control or experimental treatment. Again, this can be tough to do in lighting studies.
- Do the statistics support the conclusions?12  We’ve all heard the jokes about how numbers can be manipulated to tell a story – or distort one. In scientific papers, if a hypothesis has been tested, read the Conclusions section to see if the statistics support the prediction being tested. This will be denoted by a “confidence interval” and a corresponding asterisk. A single asterisk denotes a confidence interval of 95%, which means that in 100 tests, the same result would occur 95 times. Two asterisks indicates a confidence interval of 99%, which indicates a likelihood that if the experiment is repeated, the same result would occur 99% of the time. Both of these indicate a strong likelihood of an effect being believable, but due to the inherent variability of living systems, many important biological effects do not manage to achieve high statistical significance. Beyond the confidence interval, another important question is whether the researchers have applied the correct statistical test for the kind of data and experimental design they are using. It is beyond the scope of this article, but a scientist or statistician will be able to evaluate the method by which statistical tests have been applied to experimental results.
- A special case of improper statistical misuse is called P-hacking.13  Also known as “data fishing” or “data dredging,” it is the research practice of testing many different combinations of experimental conditions without having formulated a hypothesis beforehand. Under these circumstances, as the number of tested conditions increases, the odds that some comparison will show a random correlation or high confidence interval also increases. It is not uncommon for a pair of nonsensical or meaningless datasets to reach statistical significance in large experiments with many different groups. Compounding this, if this practice is conducted repeatedly across multiple published articles with different sets of data, then the authors can and will often construct a “story” or contrived explanation to account for a nonexistent relationship between, within, and across data. Such occurrences can have a profound effect on scientific progress in a field of study because they represent “false leads” in uncovering the actual mechanism of action in fields such as circadian research. Good scientists have the ability to identify such meaningless correlations as they occur in published papers.
There are a number of other factors that informed readers use to assess the quality of a published research paper, and thereby infer the likelihood that the study results are believable. These include:
- The quality of the investigator, lab and academic institution that is publishing their results.14  In any given field of study, as with buying a new car, each researcher has a reputation based on their ability to conduct good experiments and publish thoughtful results and insightful conclusions that advance the field by addressing and answering the “big” questions. Such investigators publish papers regularly (but not too frequently) in high quality journals (see next bullet point), have the ability to place their results in a meaningful context relative to the rest of the field (rather than struggling to account for multiple findings that go against established knowledge), and support their conclusions with references that cite the prior research of a number of other research groups in the field. Over time and with increased experience at reading papers, one can develop a sense of who the quality researchers and groups are.
- The quality of the journal in which results are published.15  Scientific journals vary in their quality and standing in their field, and trained scientists have learned to know how journals rate relative to one another in terms of reputation and quality—and thus the believability of the results. There are many different scientific journals; some cater to a particular scientific topic, and some take a wider survey; in either case, journals tend to partition into tiers of quality and the respect they command among researchers. The best journals carry the most prestige but are more difficult to get published in, and require a longer time for review, edits, and general publisher processing. Sometimes it is preferable to publish in a second-tier journal in order to get one’s results published more quickly. Journals are ranked with a scoring system called the Impact Factor,16  which measures how many other scientists cite papers published in a particular journal. This suggests that cited papers are influential in the field and thus will be cited as important by more researchers. This can be used as a measure of how good a journal is, although it is not a perfect system. One should also consider the ranking of a journal within the field specified by the publisher. For example, the journal Lighting Research and Technology has an Impact Factor of 1.921 (ref: http://journals.sagepub.com/impact-factor/lrt ), but it is ranked by Sage Journals in the Construction and Building Technology area – not exactly relevant for biomedical research studies.
- The references at the end of a paper.17  Authors of published papers make assertions throughout their manuscript and construct a meaningful context for the experiments they are conducting. Each assertion is based on existing knowledge and should be supported by a citation of a previously published paper. These citations are numbered throughout a paper and can be found in a list at the end of the paper. It is important to read the citations to determine whether they are relevant and valid for the assertion they are being used to support. Unfortunately, in some cases authors cite irrelevant, invalid, or very weak existing papers that may not support the hypothesis or rationale for an experiment. As an example, many papers or discussions of the “hazard” posed by blue LED light exposure cite papers that aren’t directly relevant to what a normal person would experience on any given day.18  Such citations often refer to papers in which the subjects were albino rats exposed to 16 or more hours of continuous “forced” blue LED light with no shelter available. Or, the papers might document very good and rigorous effects of blue light exposure on rodent eyes kept in a petri dish for several hours. While such studies might have been conducted perfectly, with good research methodology and strong, statistically significant effects, such results would have little relevance if they were compared to light exposure of a person in normal daily life. The failure to cite relevant prior research studies is an insidious form of misdirection that occurs all too frequently.
- The peer review process.19  When a researcher sends a manuscript to a journal for consideration for publication, the editor strips the manuscript of identifiers and sends it out to 2 – 3 other scientists to review and ask questions, suggest changes, and ultimately recommend or reject a manuscript for publication. Such a process works well when a manuscript is relevant to the field covered by a journal, but problematic if the journal is in an unrelated field. This is because the reviewers might have no training in the field covered in the manuscript, and thus they fail to detect and point out errors. The intention of the process is to raise the quality of papers, but in recent years the peer review process has come under fire, and reforms are likely coming (a good general discussion can be found here: https://www.npr.org/sections/health-shots/2018/02/24/586184355/scientists-aim-to-pull-peer-review-out-of-the-17th-century ).
At present, the prevailing attitude among scientists is generally to assume that peer review may have been insufficient to ensure that a published paper is reputable, and to use one’s own skills and insights along with communications with professional colleagues to determine whether a paper is likely to be valid.
We are currently in the midst of great uncertainty about whether any of the research results coming from studies of light effects upon health are ready to be translated into applied practices at healthcare facilities and in bedrooms. Lighting, controls, and IoT companies are eager to enter this potentially profitable field, while many scientists are urging restraint and caution.
We are also seeing the emergence of self-proclaimed unqualified “experts” who profess to understand the state of knowledge in the field. It is likely now apparent that part of this uncertainty arises from the complexity in reading and understanding the scientific literature domains that are relevant to the development of standards and guidelines for health-conferring light exposure protocols. Those of us involved in the process of technology translation hope to provide education and clarity, and to bring together a diverse group of lighting subject matter experts and practitioners to develop a pathway to implementation.
1Halper, M., (2017), European lighting regulations could help usher in human centric lighting. LEDs Magazine, March 2017, pp. 39-42.
2Cohrs, R.J. et al., (2014). Translational Medicine definition by the European Society for Translational Medicine. New Horizons in Translational Medicine. 2(3), pp.86–88. DOI:http://doi.org/10.1016/j.nhtm.2014.12.002 
3Willmorth, K., June (2017), The Pseudo Science Marketing of Human Centricity, commentary, Lumenique blog, https://lumeniquessl.com/2017/06/10/the-pseudo-science-marketing-of-human-centrcity/ 
4Pain, E., (2016), How to (seriously) read a scientific paper, Science 361(6397), doi:10.1126/science.caredit.a1600047
5Jalil, Mohammad Muaz, Practical Guidelines for Conducting Research – Summarising Good Research Practice in Line with the DCED Standard (February 2013). Available at SSRN: https://ssrn.com/abstract=2591803  or http://dx.doi.org/10.2139/ssrn.2591803 
6The Marshall Protocol Knowledge Base, (2012), Differences between in vitro, in vivo, and in silico studies, https://mpkb.org/home/patients/assessing_literature/in_vitro_studies 
7Deisseroth, K.; Feng, G.; Majewska, A. K.; Miesenbock, G.; Ting, A.; Schnitzer, M. J. (2006).“Next-Generation Optical Technologies for Illuminating Genetically Targeted Brain Circuits” . Journal of Neuroscience. 26 (41):10380–6. doi :1523/JNEUROSCI.3863-06.2006 . PMC  2820367  . PMID  17035522 
8Biau, D., S. Kerneis, and R. Porcher, (2008), Statistics in Brief: The importance of sample size in the planning and interpretation of medical research, Clin Orthop Relat Res, 466(9): 2282-2288.
9Gross, A. and N. Mantel, (1967), The effective use of both positive and negative controls in screening experiments, Biometrics 23(2): 285-295.
10Charness, G., U. Gneezy, and M. Kuhn, (2012) Experimental methods: Between-subject and within-subject design, Journal of Economic Behavior and Organization, 81:1-8.
11Kaptchuk TJ. (2001) The double-blind, randomized, placebo-controlled trial. Gold standard or golden calf? J Clin Epidemiol. 54:541-549.
12Kinney, J., (2001) Statistics for Science and Engineering, London, Pearson, ISBN-10:0201437201
13Head, M., L. Holman, R. Lanfear, A. Kahn, and M. Jennions, (2015) The Extent and Consequences of p-Hacking in Science, PLoS Biology 13(3): e1002106. https://doi.org/10.1371/journal.pbio.1002106 
14Petersen, A., S. Fortunato, R. Pan, K. Kaski, O. Penner, A. Rungi, M. Riccoboni, H. Stanley, and F. Pammolli, (2014), Reputation and Impact in Scientific Careers, PNAS 111(43): 15316-21.
15Van Harten, J., 2014, Measuring Journal and Research Prestige, Elsevier, http://www.inaf.ulaval.ca/fileadmin/fichiers/fichiersINAF/FAST/Atelier_ecriture/Laval_–_Session_3_–_Bibliometrics_DEF.pdf 
18Ashdown, I., (2014) Blue Light Hazard…or Not? WordPress Blog https://www.researchgate.net/publication/273763540_Blue_Light_Hazard_or_Not