Final Analysis & Conclusions
Note: After we finished doing all of the individual correlations with crop yield and eliminating variables that had no appreciable influence on yield, we entered small groups of the surviving candidate predictors into a series of preliminary regression models. The variables that proved to be statistically significant in those regressions were tested in a final regression model. We do not present the results of all of those preliminary regression models here. What follows are the results of the final regression model showing the statistically significant influences on crop yield.
This analysis shows the relative and absolute influences of statistically significant predictors of crop yield, and will conclude our analysis of the YOR database. It is important to note that a regression model identifies those influences that, in combination, have an impact on crop yield. This means that although our previous analyses of individual correlations may have suggested many apparently important influences when each influence was measured in isolation from the others, this final regression analysis measures the unique influence of any given predictor when the influences of other predictors are controlled for.
Thus, a predictor that appeared to be important in an earlier analysis may no longer be deemed important based on the regression model. This is because its apparent influence is actually accounted for by another variable. How might this happen? Here's a reallife example, taken from the annals of applied statistics:
Many years ago, a largescale analysis was undertaken to identify predictors of the incidence of chronic childhood illness in various parts of the United States. Many influences were measured, and one of these influences was the number of miles of paved roadway in the county where any given child lived. However, when other influences such as amount of government and private spending on healthcare were included in the model, the influence of paved road mileage disappeared. It turns out, of course, that counties in which more money was spent on healthcare also tended to have larger budgets for road improvement: wealthier counties had healthier children. No surprise, but it points out the importance of including as many variables as possible in the analysis simultaneously, instead of just looking at single influences one at a time. 
Key Influences On Crop Yield
These are listed in order of strength, with the associated regression coefficients shown
in parentheses. The regression coefficient indicates how much change in crop yield will result from a oneunit change
in the value of the predictor (measured in units appropriate to that particular predictor):
 Number of lumens (coefficient = 1.386 grams more per square foot per
1,000 lumens). However, there appears to be a practical limit of about 14,000 lumens per square foot; so, below this
limit, more light is better; but there is a practical limit to how much light is needed. Number of lumens uniquely
explains about 31% of the variation in
crop yield in the database.
 Percent of lumens from HPS lighting (coefficient = .110 grams more per
square foot per onepercent change in proportion of lumens coming from HPS lighting). Percent of lumens from HPS lighting
uniquely explains about 14% of crop yield variation.
 YOR report sequence number, which is a surrogate for "experience" (coefficient
= .07026 grams more per square foot with each increase of one unit in sequence number). This field uniquely
explains about 5% of variation in crop yield.
 Hydroponic medium (coefficient = 6.243 grams more per square foot when using hydro than when not using a hydro medium). Hydroponic medium uniquely explains about 3% of crop yield variation.
So, in total, our regression model is explaining about 53% of the variation in crop yield that is seen in the YOR database. Because of the limited sample size (153 reports), the lack of scientific controls on growers' observations and reports, and the YOR's inability to capture all significant influences on crop yield, being able to explain 53% of crop yield variation is actually very good.
So we can see that "Lumens" is more than twice as influential as the nextbest predictor (% HPS lumens). Because with the "Lumens" predictor we have already accounted for amount of light being generated, the "% HPS lumens" variable probably represents the superior spectral properties of HPS lighting as well as its superior ability to penetrate farther down into the grow space (high intensity). "Grower Experience" (denoted by the YOR report sequence number) also has some effect; and "Hydroponic Medium" rounds out the list. (The benefits of a hydroponic medium are probably due to its superior ability to aerate the roots better than soil can, while also supplying the necessary moisture and nutrients.)
The graph below shows a scatterplot of predicted weight (on the Xaxis) by actual weight (on the Yaxis):
Here we can see that there is a fairly good correspondence between predicted and actual weight, indicating that the model is a good one. As the footnote in the graph indicates, the overall correlation between predicted and actual weight is .737 (compared with a maximum theoretical coefficient of 1.00.) If we square this number and then do some adjustments for our sample size, we arrive at a value of about .53. This means that we are explaining about 53% of the database's variation in crop yield using just these four predictors. And while some people might think this isn't a very good model, it actually is. In fact, it is rarely possible to explain anywhere near all of the variation in data such as those we are working with here. There are undoubtedly many subtle influences not captured in the YOR database. In addition, there is probably some error in the reporting of the data by growers.
Note: as a followup to this analysis, pH has created several calculators under the Cannabis Production area of this site. A grower will be able to select his lamp and the three other key influences he'll be using with his own garden, and then have his likely crop yield calculated for him.
Conclusions
The regression model tells us quite simply that if a grower wants to maximize crop yield, then he should use plenty of
light (preferably HPS lighting, because of its superior spectral properties and its canopypenetrating power). He should
also use a hydroponic medium that provides not only good moisture retention but also good aeration, which is a balance
that soil growing simply cannot deliver as well as hydroponics/aeroponics can. And, finally, experience is also an
important factor in obtaining higher crop yields. But hopefully, in the future, even those with little or no experience
can grow bigger crops, now that we have clarified the key influences on crop yield. Our main goal in the analysis of
the YOR data has been to provide a shortcut to success, so that new growers will not have to go down a long and painful
learning curve in order to achieve good results. We hope it helps.
Statistical Appendix
For those of you who have enough of a statistical background to be familiar with linear regression, here is a technical
table of output describing various statistical parameters of the final regression model.
Model Summary  
Model  R  R Square  Adjusted R Square  Std. Error of the Estimate 
1  .737(a)  .544  .531  10.0423 
(a) Predictors: (Constant), seq, lumens1, medhydr, hpslupct 
Coefficients(a)  
Model  Unstandardized Coefficients  Standardized Coefficients  t  Sig.  
B  Std. Error  Beta  
1  (Constant)  2.488  2.331  1.067  .288  
Lumens  .001  .000  .414  7.009  .000  
HPS Lumens Percent  .110  .025  .273  4.337  .000  
Hydroponic Medium  6.243  1.892  .195  3.300  .001  
Grower Experience  .070  .019  .222  3.756  .000  
(a) Dependent Variable: weight 
Moon Doggie's Analyses 









