I am not a big fan of using arguments such as “food questionnaires are unreliable” and “observational studies are worthless” to completely dismiss a study. There are many reasons for this. One of them is that, when people misreport certain diet and lifestyle patterns, but do that consistently (i.e., everybody underreports food intake), the biasing effect on coefficients of association is minor. Measurement errors may remain for this or other reasons, but regression methods (linear and nonlinear) assume the existence of such errors, and are designed to yield robust coefficients in their presence. Besides, for me to use these types of arguments would be hypocritical, since I myself have done several analyses on the China Study data (), and built what I think are valid arguments based on those analyses.
My approach is: Let us look at the data, any data, carefully, using appropriate analysis tools, and see what it tells us; maybe we will find evidence of measurement errors distorting the results and leading to mistaken conclusions, or maybe not. With this in mind, let us take a look at the top part of Table 3 of the most recent (published online in March 2012) study looking at the relationship between red meat consumption and mortality, authored by Pan et al. (Frank B. Hu is the senior author) and published in the prestigious Archives of Internal Medicine (). This is a prominent journal, with an average of over 270 citations per article according to Google Scholar. The study has received much media attention recently.
Take a look at the area highlighted in red, focusing on data from the Health Professionals sample. That is the multivariate-adjusted cardiovascular mortality rate, listed as a normalized percentage, in the highest quintile (Q5) of red meat consumption from the Health Professionals sample. The non-adjusted percentages are 1.4 percent mortality in Q5 and 1.13 in Q1 (from Table 1 of the same article); so the multivariate adjustment-normalization changed the values of the percentages somewhat, but not much. The highlighted 1.35 number suggests that for each group of 100 people who consumed a lot of red meat (Q5), when compared with a group of 100 people who consumed little red meat (Q1), there were on average 0.35 more deaths over the same period of time (more than 20 years).
The heavy red meat eaters in Q5 consumed 972.37 percent more red meat than those in Q1. This is calculated with data from Table 1 of the same article, as: (2.36-0.22)/0.22. In Q5, the 2.36 number refers to the number of servings of red meat per day, with each serving being approximately 84 g. So the heavy red meat eaters ate approximately 198 g per day (a bit less than 0.5 lb), while the light red meat eaters ate about 18 g per day. In other words, the heavy red meat eaters ate 9.7237 times more, or 972.37 percent more, red meat.
So, just to be clear, even though the folks in Q5 consumed 972.37 percent more red meat than the folks in Q1, in each matched group of 100 you would not find a single additional death over the same time period. If you looked at matched groups of 1,000 individuals, you would find 3 more deaths among the heavy red meat eaters. The same general pattern, of a minute difference, repeats itself throughout Table 3. As you can see, all of the reported mortality ratios are 1-point-something. In fact, this same pattern repeats itself in all mortality tables (all-cause, cardiovascular, cancer). This is all based on a multivariate analysis that according to the authors controlled for a large number of variables, including baseline history of diabetes.
Interestingly, looking at data from the same sample (Health Professionals), the incidence of diabetes is 75 percent higher in Q5 than in Q1. The same is true for the second sample (Nurses Health), where the Q5-Q1 difference in incidence of diabetes is even greater - 81 percent. This caught my eye, being diabetes such a prototypical “disease of affluence”. So I entered the whole data reported in the article into HCE () and WarpPLS (), and conducted some analyses. The graphs below are from HCE. The data includes both samples – Health Professionals and Nurses Health.
HCE calculates bivariate correlations, and so does WarpPLS. But WarpPLS stores numbers with a higher level of precision, so I used WarpPLS for calculating coefficients of association, including correlations. I also double-checked the numbers with other software, just in case (e.g., SPSS and MATLAB). Here are the correlations calculated by WarpPLS, which refer to the graphs above: 0.030 for red meat intake and mortality; 0.607 for diabetes and mortality; and 0.910 for food intake and diabetes. Yes, you read it right, the correlation between red meat intake and mortality is a very low and non-significant 0.030 in this dataset. Not a big surprise when you look at the related HCE graph, with the line going up and down almost at random. Note that I included the quintiles data from both the Health Professionals and Nurses Health samples in one dataset.
Those folks in Q5 had a much higher incidence of diabetes, and yet the increase in mortality for them was significantly lower, in percentage terms. A key difference between Q5 and Q1 being what? The Q5 folks ate a lot more red meat. This looks suspiciously suggestive of a finding that I came across before, based on an analysis of the China Study II data (). The finding was that animal food consumption (and red meat is an animal food) was protective, actually reducing the negative effect of wheat flour consumption on mortality. That analysis actually suggested that wheat flour consumption may not be so bad if you eat 221 g or more of animal food daily.
So, I built the model below in WarpPLS, where red meat intake (RedMeat) is hypothesized to moderate the relationship between diabetes incidence (Diabetes) and mortality (Mort). Below I am also including the graphs for the direct and moderating effects; the data is standardized, which reduces estimation error, particularly in moderating effects estimation. I used a standard linear algorithm for the calculation of the path coefficients (betas next to the arrows) and jackknifing for the calculation of the P values (confidence = 1 – P value). Jackknifing is a resampling technique that does not require multivariate normality and that tends to work well with small samples; as is the case with nonparametric techniques in general.
The direct effect of diabetes on mortality is positive (0.68) and almost statistically significant at the P < 0.05 level (confidence of 94 percent), which is noteworthy because the sample size here is so small – only 10 data points, 5 quintiles from the Health Professionals sample and 5 from the Nurses Health sample. The moderating effect is negative (-0.11), but not statistically significant (confidence of 61 percent). In the moderating effect graphs (shown side-by-side), this negative moderation is indicated by a slightly less steep inclination of the regression line for the graph on the right, which refers to high red meat intake. A less steep inclination means a less strong relationship between diabetes and mortality – among the folks who ate the most red meat.
Not too surprisingly, at least to me, the results above suggest that red meat per se may well be protective. Although we should consider a least two other possibilities. One is that red meat intake is a marker for consumption of some other things, possibly present in animal foods, that are protective - e.g., choline and vitamin K2. The other possibility is that red meat is protective in part by displacing other less healthy foods. Perhaps what we are seeing here is a combination of these.
Whatever the reason may be, red meat consumption seems to actually lessen the effect of diabetes on mortality in this sample. That is, according to this data, the more red meat is consumed, the fewer people die from diabetes. The protective effect might have been stronger if the participants had eaten more red meat, or more animal foods containing the protective factors; recall that the threshold for protection in the China Study II data was consumption of 221 g or more of animal food daily (). Having said that, it is also important to note that, if you eat excess calories to the point of becoming obese, from red meat or any other sources, your risk of developing diabetes will go up – as the earlier HCE graph relating food intake and diabetes implies.
Please keep in mind that this post is the result of a quick analysis of secondary data reported in a journal article, and its conclusions may be wrong, even though I did my best not to make any mistake (e.g., mistyping data from the article). The authors likely spent months, if not more, in their study; and have the support of one of the premier research universities in the world. Still, this post raises serious questions. I say this respectfully, as the authors did seem to try their best to control for all possible confounders.
I should also say that the moderating effect I uncovered is admittedly a fairly weak effect on this small sample and not statistically significant. But its magnitude is apparently greater than the reported effects of red meat on mortality, which are not only minute but may well be statistical artifacts. The Cox proportional hazards analysis employed in the study, which is commonly used in epidemiology, is nothing more than a sophisticated ANCOVA; it is a semi-parametric version of a special case of the broader analysis method automated by WarpPLS.
Finally, I could not control for confounders because, given the small sample, inclusion of confounders (e.g., smoking) leads to massive collinearity. WarpPLS calculates collinearity estimates automatically, and is particularly thorough at doing that (calculating them at multiple levels), so there is no way to ignore them. Collinearity can severely distort results, as pointed out in a YouTube video on WarpPLS (). Collinearity can even lead to changes in the signs of coefficients of association, in the context of multivariate analyses - e.g., a positive association appears to be negative. The authors have the original data – a much, much larger sample - which makes it much easier to deal with collinearity.
Moderating effects analyses () – we need more of that in epidemiological research eh?
Showing posts with label warppls. Show all posts
Showing posts with label warppls. Show all posts
Monday, March 19, 2012
Monday, January 16, 2012
The China Study II: Wheat’s total effect on mortality is significant, complex, and highlights the negative effects of low animal fat diets
The graph below shows the results of a multivariate nonlinear WarpPLS () analysis including the variables listed below. Each row in the dataset refers to a county in China, from the publicly available China Study II dataset (). As always, I thank Dr. Campbell and his collaborators for making the data publicly available. Other analyses based on the same dataset are also available ().
- Wheat: wheat flour consumption in g/d.
- Aprot: animal protein consumption in g/d.
- PProt: plant protein consumption in g/d.
- %FatCal: percentage of calories coming from fat.
- Mor35_69: number of deaths per 1,000 people in the 35-69 age range.
- Mor70_79: number of deaths per 1,000 people in the 70-79 age range.
Below are the total effects of wheat flour consumption, along with the number of paths used to calculate them, and the respective P values (i.e., probabilities that the effects are due to chance). Total effects are calculated by considering all of the paths connecting two variables. Identifying each path is a bit like solving a maze puzzle; you have to follow the arrows connecting the two variables. Version 3.0 of WarpPLS (soon to be released) does that automatically, and also calculates the corresponding P values.
To the best of my knowledge, this is the first time that total effects are calculated for this dataset. As you can see, the total effects of wheat flour consumption on mortality in the 35-69 and 70-79 age ranges are both significant, and fairly complex in this model, each relying on 7 paths. The P value for mortality in the 35-69 age range is 0.038; in other words, the probability that the effect is “real”, and thus not due to chance, is 96.2 percent (100-3.8=96.2). The P value for mortality in the 70-79 age range is 0.024; a 97.6 percent probability that the effect is “real”.
Note that in the model the effects of wheat flour consumption on mortality in both age ranges are hypothesized to be mediated by animal protein consumption, plant protein consumption, and fat consumption. These mediating effects have been suggested by previous analyses discussed on this blog (). The strongest individual paths are between wheat flour consumption and plant protein consumption, plant protein consumption and animal protein consumption, as well as animal protein consumption and fat consumption.
So wheat flour consumption contributes to plant protein consumption, probably by being a main source of plant protein (through gluten). Plant protein consumption in turn decreases animal protein consumption, which significantly decreases fat consumption. From this latter connection we can tell that most of the fat consumed likely came from animal sources.
How much fat and protein are we talking about? The graphs below tell us how much, and these graphs are quite interesting. They suggest that, in this dataset, daily protein consumption tended to be on average 60 g, whatever the source. If more protein came from plant foods, the proportion from animal foods went down, and vice-versa.
The more animal protein consumed, the more fat is also consumed in this dataset. And that is animal fat, which comes mostly in the form of saturated and monounsaturated fats, in roughly equal amounts. How do I know that it is animal fat? Because of the strong association with animal protein. By the way, with a few exceptions (e.g., some species of fatty fish) animal foods in general provide only small amounts of polyunsaturated fats – omega-3 and omega-6.
Individually, animal protein and wheat flour consumption have the strongest direct effects on mortality in both age ranges. Animal protein consumption is protective, and wheat flour consumption detrimental.
Does the connection between animal protein, animal fat, and longevity mean that a diet high in saturated and monounsaturated fats is healthy for most people? Not necessarily, at least without extrapolation, although the results do not suggest otherwise. Look at the amounts of fat consumed per day. They range from a little less than 20 g/d to a little over 90 g/d. By comparison, one steak of top sirloin (about 380 g of meat, cooked) trimmed to almost no visible fat gives you about 37 g of fat.
These results do suggest that consumption of animal fats, primarily saturated and monounsaturated fats, is likely to be particularly healthy in the context of a low fat diet. Or, said in a different way, these results suggest that longevity is decreased by diets that are low in animal fats.
How much fat should one eat? In this dataset, the more fat was consumed together with animal protein (i.e., the more animal fat was consumed), the better in terms of longevity. In other words, in this dataset the lowest levels of mortality were associated with the highest levels of animal fat consumption. The highest level of fat consumption in the dataset was a little over 90 g/d.
What about higher fat intake contexts? Well, we know that men on a high fat diet such as a variation of the Optimal Diet can consume on average a little over 170 g/d of animal fat (130 g/d for women), and their health markers remain generally good ().
One of the critical limiting factors, in terms of health, seems to be the amount of animal fat that one can eat and still remain relatively lean. Dietary saturated and monounsaturated fats are healthy. But when accumulated as excess body fat, beyond a certain level, they become pro-inflammatory.
- Wheat: wheat flour consumption in g/d.
- Aprot: animal protein consumption in g/d.
- PProt: plant protein consumption in g/d.
- %FatCal: percentage of calories coming from fat.
- Mor35_69: number of deaths per 1,000 people in the 35-69 age range.
- Mor70_79: number of deaths per 1,000 people in the 70-79 age range.
Below are the total effects of wheat flour consumption, along with the number of paths used to calculate them, and the respective P values (i.e., probabilities that the effects are due to chance). Total effects are calculated by considering all of the paths connecting two variables. Identifying each path is a bit like solving a maze puzzle; you have to follow the arrows connecting the two variables. Version 3.0 of WarpPLS (soon to be released) does that automatically, and also calculates the corresponding P values.
To the best of my knowledge, this is the first time that total effects are calculated for this dataset. As you can see, the total effects of wheat flour consumption on mortality in the 35-69 and 70-79 age ranges are both significant, and fairly complex in this model, each relying on 7 paths. The P value for mortality in the 35-69 age range is 0.038; in other words, the probability that the effect is “real”, and thus not due to chance, is 96.2 percent (100-3.8=96.2). The P value for mortality in the 70-79 age range is 0.024; a 97.6 percent probability that the effect is “real”.
Note that in the model the effects of wheat flour consumption on mortality in both age ranges are hypothesized to be mediated by animal protein consumption, plant protein consumption, and fat consumption. These mediating effects have been suggested by previous analyses discussed on this blog (). The strongest individual paths are between wheat flour consumption and plant protein consumption, plant protein consumption and animal protein consumption, as well as animal protein consumption and fat consumption.
So wheat flour consumption contributes to plant protein consumption, probably by being a main source of plant protein (through gluten). Plant protein consumption in turn decreases animal protein consumption, which significantly decreases fat consumption. From this latter connection we can tell that most of the fat consumed likely came from animal sources.
How much fat and protein are we talking about? The graphs below tell us how much, and these graphs are quite interesting. They suggest that, in this dataset, daily protein consumption tended to be on average 60 g, whatever the source. If more protein came from plant foods, the proportion from animal foods went down, and vice-versa.
The more animal protein consumed, the more fat is also consumed in this dataset. And that is animal fat, which comes mostly in the form of saturated and monounsaturated fats, in roughly equal amounts. How do I know that it is animal fat? Because of the strong association with animal protein. By the way, with a few exceptions (e.g., some species of fatty fish) animal foods in general provide only small amounts of polyunsaturated fats – omega-3 and omega-6.
Individually, animal protein and wheat flour consumption have the strongest direct effects on mortality in both age ranges. Animal protein consumption is protective, and wheat flour consumption detrimental.
Does the connection between animal protein, animal fat, and longevity mean that a diet high in saturated and monounsaturated fats is healthy for most people? Not necessarily, at least without extrapolation, although the results do not suggest otherwise. Look at the amounts of fat consumed per day. They range from a little less than 20 g/d to a little over 90 g/d. By comparison, one steak of top sirloin (about 380 g of meat, cooked) trimmed to almost no visible fat gives you about 37 g of fat.
These results do suggest that consumption of animal fats, primarily saturated and monounsaturated fats, is likely to be particularly healthy in the context of a low fat diet. Or, said in a different way, these results suggest that longevity is decreased by diets that are low in animal fats.
How much fat should one eat? In this dataset, the more fat was consumed together with animal protein (i.e., the more animal fat was consumed), the better in terms of longevity. In other words, in this dataset the lowest levels of mortality were associated with the highest levels of animal fat consumption. The highest level of fat consumption in the dataset was a little over 90 g/d.
What about higher fat intake contexts? Well, we know that men on a high fat diet such as a variation of the Optimal Diet can consume on average a little over 170 g/d of animal fat (130 g/d for women), and their health markers remain generally good ().
One of the critical limiting factors, in terms of health, seems to be the amount of animal fat that one can eat and still remain relatively lean. Dietary saturated and monounsaturated fats are healthy. But when accumulated as excess body fat, beyond a certain level, they become pro-inflammatory.
Subscribe to:
Posts (Atom)