Selected recent comments - more about this
- Laurate Permeates the Paracellular Pathway for Small Molecules in the Intestinal Epithelial Cell Model HT-29/B6 via Opening the Tight Junctions by Reversible Relocation of Claudin-5.
Pharm Res. 2014.1 comment
Salah Amasheh2014 Oct 08 06:06 a.m. (14 hours ago)
A typographical error occurred in the title. The correct title would start with "Laurate Permeabilizes..."
- Uncovering the Hidden Risk Architecture of the Schizophrenias: Confirmation in Three Independent Genome-Wide Association Studies.
Am J Psychiatry. 2014.16 comments
Igor Zwir2014 Oct 07 6:15 p.m. (yesterday) 8 of 8 people found this helpful
Rationale of the method applied in Arnedo et. al.
Clustering methods have been applied for long time in pattern recognition problems and in a variety of fields and actually are embedded in most of our devices, from a simple refrigerator to a Boing engine. In biomedical sciences there is a trend of considering that few classes of clustering exist (e.g., hierarchical cluster of K-Means). However, datamining, knowledge discovering, and machine leaning are fields of computer sciences with very rich and different either theoretical or empirical methods. Much of the power of these methods is based on the fusion of them, taking advantages of each one.
The methodology and framework proposed in this Arnedo et. al encodes a generalized factorization method (GFM) that was designed to identify structural patterns or clusters (substructures) that characterize complex objects (1-8) embedded in databases (6, 7, 9, 10). Unfortunately, solving these kinds of complex problems cannot be performed by a single clustering method but utilizing and combining the advantages of many of them, as were implemented in the GFM and summarized below.
First, we utilized the notion of Model-based clusters (5, 11, 12), where the centroid/prototype of a cluster is not a point but a model (e.g., line segment, ellipsoids, (5, 13, 14) or see C-varieties algorithm (12)) -or a family of models (8) - defined as a relation between cohesive subset of points that meet certain constraints (e.g., Relational clustering (12), undirected graph (12)). When the subjacent equations of the models are unknown, they can be estimated from the data by identifying relations between subsets of observations that show similar patterns under a specific subset of relevant descriptive features (attributes) (4, 5, 15). Here we proposed that in a pure data driven analysis these unknown models (grey boxes) can be learned as local relationships between subsets of features and observations, which are termed biclusters. The biclusters are often represented as sub-matrices (4, 5, 15). Other models proposed in this work (black boxes) consist of those that are encoded by a particular method devoted to recognize certain classes of patterns.
Second, to identify a family of models (8) or biclusters we planned the use of the NMF (16) factorization algorithms (tensor analysis), which can uncover cohesive data represented as sub-matrices (factors). When sorted and thresholded, each sub-matrix represents a bicluster. Notably, biclustering produces several local models of the data (17), each one capturing intrinsic relationships between subsets of observations sharing subsets –instead of all- features. Diverse biclusters –instead of uniform data explanations- provide a better understanding of the described object that is diluted when together, all features are used in a single global model of data typically performed by clustering methods (17).
Third, we proposed in this work that biclusters can be fuzzy (overlapping, see Fuzzy clustering (18)), where voxels and/or subjects can belong to more than one bicluster), as well as possibilistic (non-exhaustive partitions, see Possibilistic clustering (12)), where certain voxels and/or subjects may not be assigned to any bicluster, and thus, outliers can be detected.
Fourth, we demonstrated that biclusters could be identified without choosing a priori a particular number of clusters (11, 13, 14, 19). Setting this parameter as a small number may eventually generate few but large data partitions here defined as general biclusters that may underfit the data. In contrast, using a big number of clusters will generate many clusters with partitions of small size (i.e., more granular partitions) here defined as specific biclusters that may overfit the data (1, 5, 11, 13, 14, 19-21). Although there are many different validity indices (12, 18) that suggest the best number of clusters for a given dataset, they often produce contradictory results (11, 13, 14, 19). This problem is exacerbated when Centroid-based (e.g., k-means, NMF) or Distribution-based clustering (e.g., EM) is utilized because two successive numbers of clusters may generate completely different cluster arrangements (see (12, 18) for a review). Instead of fighting over the lost battle of uncovering a single optimal number of clusters, biclusters were calculated from partitions generated by all number of clusters between 2 and √n (12), where n is the number of observations (subjects). We postponed the selection of the best clusters until all partitions were examined to select a set of clusters that together provide an optimal description of the sample. These clusters could be chosen from different partitions that were generated by different number of clusters (see Consensus clustering (1, 5, 11, 13, 14, 19-21)).
Fifth, formulation of the bicluster identification problem from different partitions may produce redundant, excessively specific and/or excessively general biclusters (4, 5, 15, 22, 23). To select optimal biclusters, we proposed a GFM that performs the following Consensus clustering strategy: (i) eliminates all redundant biclusters by comparing the similarity between their subjacent sub-matrices using Hypergeometric statistics (see Methods and Material in Supplement 1), and selecting only one representative bicluster from each set of equivalent sub-matrices. Biclusters harboring <10% of the total number of subjects were not considered to avoid a trend to obtain singleton biclusters. (ii) From the non-redundant biclusters, GMF selects all biclusters that are optimal in terms of specificity, generality and diversity. To do so, it selects all non-dominated biclusters where one bicluster is not worse (i.e. dominated) than another in both objectives: specificity and generality (4, 5, 15, 22, 23). Moreover, to ensure diversity, GMF applies the non-dominance metric only among biclusters within a neighborhood (22, 23) (i.e., biclusters with a relatively high overlapping of subjects and voxels with). This Multiobjective and Multimodal optimization process is analogous to Minimum Description-Length methods (24) and was carried out as described in (1-8). (iii) The remaining optimal biclusters are hierarchically organized by inclusion of their subjects into connected or disjoint hierarchies (i.e., subgraphs or subnetworks).
Sixth, and after the first step of identifying interesting patterns describing objects embedded in the data, we went one step further by also finding characteristic descriptions for each group, where each group represents a concept or class using Conceptual clustering. In this work we incorporated factors such as the generality and simplicity of the derived bicluster descriptions by (1) relational clustering validation against data driven independent descriptions obtained in other domains of knowledge, and (2), incorporate the previous structured knowledge into predictive multiclassifiers mixing machine learning and multiobjetive optimization techniques.
Igor Zwir2014 Oct 07 6:15 p.m. (yesterday) 4 of 4 people found this helpful
References
- Cordon O, Herrera F, Zwir I (2002): Linguistic modeling by hierarchical systems of linguistic rules. Ieee T Fuzzy Syst. 10:2-20.
- Ruspini EH, Zwir I (2002): Automated generation of qualitative representations of complex objects by hybrid soft-computing methods. In: Pal SK, Pal A, editors. Pattern recognition : from classical to modern approaches. New Jersey.: World Scientific, pp 454-474.
- Cordon O, Herrera F, Zwir I (2003): A hierarchical knowledge-based environment for linguistic modeling: models and iterative methodology. Fuzzy Set Syst. 138:307-341.
- Zwir I, Shin D, Kato A, Nishino K, Latifi T, Solomon F, et al. (2005): Dissecting the PhoP regulatory network of Escherichia coli and Salmonella enterica. Proc Natl Acad Sci U S A. 102:2862-2867.
- Zwir I, Huang H, Groisman EA (2005): Analysis of Differentially-Regulated Genes within a Regulatory Network by GPS Genome Navigation. Bioinformatics. 21:4073-4083.
- Romero-Zaliz R, C. Rubio R, Cordón O, Cobb P, Herrera F, Zwir I (2008): A multi-objective evolutionary conceptual clustering methodology for gene annotation within structural databases: a case of study on the gene ontology database. IEEE Transactions on Evolutionary Computation . 12:6:679-701.
- Romero-Zaliz R, Del Val C, Cobb JP, Zwir I (2008): Onto-CC: a web server for identifying Gene Ontology conceptual clusters. Nucleic acids research. 36:W352-357.
- Harari O, Park SY, Huang H, Groisman EA, Zwir I (2010): Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria. PLoS computational biology. 6:e1000862.
- Cheeseman P, Stanford University. (1990): Bayesian learning. [Palo Alto, Calif.]: Board of Trustees of the Leland Standord Junior University.
- Cook DJ, Holder LB, Su S, Maglothin R, Jonyer I (2001): Structural mining of molecular biology data. IEEE Eng Med Biol Mag. 20:67-74.
- Fraley C, Raftery AE (1998): How many clusters? Which clustering method? Answers via model-based cluster analysis. The computer journal. 41:578-588.
- Bezdek JC (1998): Pattern Analysis. In: Pedrycz W, Bonissone PP, Ruspini EH, editors. Handbook of Fuzzy Computation. Bristol: Institute of Physics, pp F6.1.1-F6.6.20.
- Bittner T, Smith B (2003): A theory of granular partitions. Foundations of geographic information science.117-151.
- Fred AL, Jain AK (2005): Combining multiple clusterings using evidence accumulation. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 27:835-850.
- Arnedo J, del Val C, de Erausquin GA, Romero-Zaliz R, Svrakic D, Cloninger CR, et al. (2013): PGMRA : a web server for ( phenotype x genotype ) many-to-many relation analysis in GWAS. Nucleic Acid Research. 75.
- Pascual-Montano A, Carazo JM, Kochi K, Lehmann D, Pascual-Marqui RD (2006): Nonsmooth nonnegative matrix factorization (nsNMF). IEEE transactions on pattern analysis and machine intelligence. 28:403-415.
- Madeira SC, Oliveira AL (2004): Biclustering algorithms for biological data analysis: a survey. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM. 1:24-45.
- Bezdek JC, Pal SK, IEEE Neural Networks Council (1992): Fuzzy models for pattern recognition : methods that search for structures in data. New York: IEEE Press.
- Latorre Carmona P, Sánchez JS, Fred ALN, SpringerLink (Online service) (2013): Mathematical Methodologies in Pattern Recognition and Machine Learning Contributions from the International Conference on Pattern Recognition Applications and Methods, 2012. Springer Proceedings in Mathematics & Statistics,. New York, NY: Springer New York : Imprint: Springer,, pp VIII, 194 p. 158 illus., 140 illus. in color.
- Kim EY, Kim SY, Ashlock D, Nam D (2009): MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering. BMC bioinformatics. 10:260.
- Saeed F, Salim N, Abdo A (2012): Voting-based consensus clustering for combining multiple clusterings of chemical structures. Journal of cheminformatics. 4:37.
- Deb K (2001): Nonlinear goal programming using multi-objective genetic algorithms. J Oper Res Soc. 52:291-302.
- Deb K (2001): Multi-objective optimization using evolutionary algorithms. 1st ed. Chichester ; New York: John Wiley & Sons.
- Rissanen J (1989): Stochastic complexity in statistical inquiry. Singapore: World Scientific.
In reply to a comment by Gerome Breen2014 Sep 28 6:23 p.m.
C. Robert Cloninger2014 Oct 06 12:42 p.m. (2 days ago) 11 of 11 people found this helpful
The Facts about Replication and Significance Testing of SNP Selection
B-S expressed concern about our replication process. Of course in traditional GWAS, replication has always been a serious problem, which is the basis for the rationale of PGC to carry out meta-analysis of large collections of samples despite their heterogeneity and limited phenotypic description. It was most challenging for us to identify samples with adequate clinical description to apply our novel approach, but the reward was in identifying strong effects that replicated consistently across three samples, including the Portuguese Islands study that used the same diagnostic instrument in a specific ethnic sample. The samples were independently recruited and independently analyzed, as we stated clearly in the published report. SNP sets, phenotypic sets and associations were separately calculated for the three samples to avoid weighted or biased aggregations. Then, we used a well-known co-clustering test based on the hypergeometric distribution to establish the replicability of results from one sample in the other. This test has been used widely in molecular biology (15, 16, 25), and as a general strategy for validating clusters. For example, it has also been implemented into software packages such as TIBCO/Spotfire. The concerns expressed by B-S about replication have no reasonable justification. Thus we feel that the concerns expressed by B-S about replication are overstated and empirically unfounded. Again, the strength of this new approach is it allows us to avoid some of the major problems that plague traditional GWAS approaches.
B-S also expressed their concern about the use of a permutation test, claiming that "because SNP sets differ in allele frequency between cases and controls, this procedure does not generate a valid null distribution". The permutation test was used not to establish the significance of the SNP sets, which was evaluated by the SKAT method (14), but rather to test the validity (and approximate probability) of the association between SNP sets and symptom sets. Controls were not used in this test at all, as they have no symptoms of psychosis. Moreover, these symptoms were not even evaluated in the reported inventories. The misunderstanding of B-S is probably due to a lack of familiarity with this new statistical procedure, which highlights the previously discussed difficulties people have when first trying to understand a novel approach.
C. Robert Cloninger2014 Oct 06 12:44 p.m. (2 days ago)edited 11 of 11 people found this helpful
Conclusion
We appreciate the opportunity to clarify the fundamental differences between the assumptions and goals of traditional GWAS and our novel approach that addresses the complexity of common disorders with sophisticated and well-validated machine-learning and data-mining methods. We hope that the profound differences in the approaches with which Breen and colleagues are familiar and those developed by us should stimulate greater understanding of the challenges faced by the fields of psychiatric and medical genetics. We recognize that this new approach will cause a period of reexamination of standard methodology in this field, but every major advance in genetics, and in all of science for that matter, has always required flexibility and creative thinking. There are always things that we can improve upon in any method, and we recognize that many incremental improvements are essential for the advance of science.
We have put forth a new data-driven method that allows the uncovering of complex genotypic-phenotypic relations when they are present without imposing this as an a priori assumption. We uncovered relationships are in fact highly complex, which allowed us to identify individuals at high risk and to associate specific SNP clusters with specific clinical syndromes despite the presence of extensive pleiotropy and heterogeneity. This approach, like all those that have preceded it, is undoubtedly imperfect and will also require refinement and may ultimately give way to yet another approach that will explain more. Such methodological evolution is nothing more than the typical course of advancement in science. We hope that these exciting developments will lead to new ways to push the boundaries of accepted science, and help us to question a priori assumptions that restrict our understanding of all the information embedded in data.
If this discussion has shown us nothing else, it is that this process of questioning and reflection has already begun. Ultimately, beyond all of the technical issues, our main goal is to help those in need. With schizophrenia, we know the need is great from the tremendous outpouring of requests for guidance and help that we have received, and we know that there are many people with other diseases who may benefit from our new approach. We can all be comforted knowing that our debate can bring us closer to doing what we are really here to do – that is, helping those suffering from debilitating diseases and finding ways to promote their health and well-being. Whatever path leads us there is worth considering. So let us not permit our philosophical or scientific differences to prevent us from allowing for a sufficient diversity in our tactics, because we never know what path will lead us towards our common goals of improving health and reducing the burden of disease.
C. Robert Cloninger2014 Oct 06 1:05 p.m. (2 days ago) 5 of 5 people found this helpful
References
- Arnedo J, Svrakic DM, Del Val C, Romero-Zaliz R, Hernandez-Cuervo H, Fanous AH, et al. Uncovering the Hidden Risk Architecture of the Schizophrenias: Confirmation in Three Independent Genome-Wide Association Studies. The American journal of psychiatry. 2014.
- Arnedo J, del Val C, de Erausquin GA, Romero-Zaliz R, Svrakic D, Cloninger CR, et al. PGMRA: A web server for (Phenotype X Genotype) many-to-many relation analysis in GWAS. Nucleic Acids Res. 2013(Web Server issue). PMCID: 2447763.
- Risch N. Linkage strategies for genetically complex traits. I. Multilocus models. American journal of human genetics. 1990;46(2):222-8. PMCID: 1684987.
- Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511(7510):421-7. PMCID: 4112379.
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics. 2007;81(3):559-75. PMCID: 1950838.
- Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, et al. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009;460(7256):753-7. PMCID: 2775422.
- Graves JA. Review: Sex chromosome evolution and the expression of sex-specific genes in the placenta. Placenta. 2010;31 Suppl:S27-32.
- Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, et al. Linkage disequilibrium in the human genome. Nature. 2001;411(6834):199-204.
- Slatkin M. Linkage disequilibrium--understanding the evolutionary past and mapping the medical future. Nat Rev Genet. 2008;9(6):477-85.
- Wiehe T, Slatkin M. Epistatic selection in a multi-locus Levene model and implications for linkage disequilibrium. Theor Popul Biol. 1998;53(1):75-84.
- Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theor Popul Biol. 2001;60(3):227-37.
- Koch E, Ristroph M, Kirkpatrick M. Long range linkage disequilibrium across the human genome. PLoS One. 2013;8(12):e80754. PMCID: 3861250.
- Wright S. The shifting balance theory and macroevolution. Annual review of genetics. 1982;16:1-19.
- Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, et al. Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet.86(6):929-42. PMCID: 3032061.
- Zwir I, Shin D, Kato A, Nishino K, Latifi T, Solomon F, et al. Dissecting the PhoP regulatory network of Escherichia coli and Salmonella enterica. Proc Natl Acad Sci U S A. 2005;102(8):2862-7.
- Zwir I, Huang H, Groisman EA. Analysis of Differentially-Regulated Genes within a Regulatory Network by GPS Genome Navigation. Bioinformatics. 2005;21(22):4073-83.
- Romero-Zaliz R, Del Val C, Cobb JP, Zwir I. Onto-CC: a web server for identifying Gene Ontology conceptual clusters. Nucleic Acids Res. 2008;36(Web Server issue):W352-7.
- Romero-Zaliz R, C. Rubio R, Cordón O, Cobb P, Herrera F, Zwir I. A multi-objective evolutionary conceptual clustering methodology for gene annotation within structural databases: a case of study on the gene ontology database. IEEE Transactions on Evolutionary Computation . 2008;12:6:679-701.
- Harari O, Park SY, Huang H, Groisman EA, Zwir I. Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria. PLoS Comput Biol. 2010;6(7):e1000862. PMCID: 2908699.
- Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401(6755):788-91.
- Mejia-Roa E, Carmona-Saez P, Nogales R, Vicente C, Vazquez M, Yang XY, et al. bioNMF: a web-based tool for nonnegative matrix factorization in biology. Nucleic Acids Res. 2008;36(Web Server issue):W523-8. PMCID: 2447803.
- Pascual-Montano A, Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Marqui RD. bioNMF: a versatile tool for non-negative matrix factorization in biology. BMC Bioinformatics. 2006;7:366. PMCID: 1550731.
- Tamayo P, Scanfeld D, Ebert BL, Gillette MA, Roberts CW, Mesirov JP. Metagene projection for cross-platform, cross-species characterization of global transcriptional states. Proc Natl Acad Sci U S A. 2007;104(14):5959-64. PMCID: 1838404.
- Cichocki A. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blinded separation. Chichester, U.K.: John Wiley; 2009.
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999;22(3):281-5.
AYMAN FANOUS2014 Oct 06 2:50 p.m. (2 days ago)edited 10 of 10 people found this helpful
Reply to Breen et al. and Cloninger et al.
On behalf of: Ayman H. Fanous, M.D. (Washington DC VA Medical Center, Washington, DC, USA); Fabio Macciardi, M.D., Ph.D. (Department of Psychiatry, University of California at Irvine, Irvine, CA) Michele T. Pato, M.D. (Department of Psychiatry, University of Southern California, Los Angeles, CA, USA); Carlos N. Pato, M.D., Ph.D. (Department of Psychiatry, University of Southern California, Los Angeles, CA, USA).
The paper by Arnedo et al., of which three of us (AHF, CNP, and MTP) are co-authors representing the Portuguese Island schizophrenia study, has generated considerable controversy with respect to the analytic methodology employed, as detailed in the commentary by Breen et al. The commentary has now been answered by Dr. Cloninger and colleagues. We feel it would have been better to have this debate in a different forum, and preferably, with peer review. However, in light of the controversy, we feel it is important to state the following points about our positions.
First, we support freedom of inquiry and the important place of exploratory analyses and novel methodologies in the advancement of science. While the utmost scrutiny should be given to the implementation and results of such methods, it is not beneficial for such scrutiny to have a chilling effect on further exploration.
Second, approaches such as the ones employed by Arnedo et al. could be valuable to the field and they are commonplace in other areas of biomedical research. However, while they have heuristic value, they have only begun to be tested and debated in psychiatric genetics, and their results must not be viewed as definitive. GWAS therefore remains the current method of choice to detect the effects of common variants on categorical and continuous phenotypes. The efforts of hundreds of individuals and institutions worldwide in the PGC and other consortia have led to breakthroughs in identifying the genetic variants that contribute to the genetic basis of common diseases. This has opened new and exciting vistas for elucidating their pathophysiological bases, and clues for the development of potentially more effective diagnostic and treatment modalities. Strategies such as those employed by Arnedo et al. must be viewed as complementary to, but not replacing, traditional GWAS.
Third, this freedom of inquiry carries with it the responsibility to make all methods and results completely transparent, and to facilitate all reasonable efforts to independently validate them. We applaud Drs. Zwir, Cloninger, and colleagues making their SNP lists and all pertinent data available to qualified investigators who seek to test the reproducibility of their results.
Fourth, we hope that this debate will spur generative discussions about the place of exploratory analyses in psychiatric genetics, as well as optimal approaches to constructively debating such methods.
Leonid Teytelman2014 Oct 06 6:47 p.m. (2 days ago)edited 3 of 3 people found this helpful
Dear Authors,
We have published an analysis in S. cerevisiae, showing expression-dependent artifactual ChIP enrichment at highly expressed loci (Teytelman L, 2013 "Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins"). As you know, our finding raises the question of whether HOT regions may also be influenced by the same artifact.
It is great that you have considered our work and have thoughtfully responded to our analysis. Below, I would like to continue this discussion in an effort to better understand the artifact, its causes, and whether it may be contributing to the enrichment at the HOT loci.
1. “we have demonstrated that there is no correlation between our non-specific binding controls (IgG) and our measured transcription factor occupancy;”
Considering our results with no-tag control experiments, an IgG may fail to control for the artifact. It would be great if you could instead perform a GFP ChIP-Seq, similarly to what we have done in yeast.
2. The regions determined in ref. 41 have very low enrichment (twofold or less) of non-specific immunoprecipation in anti-GFP antibody controls over input DNA evaluated using a non-standard sliding-window approach. Importantly, immunoprecipitation/input ratios at this level are typically not considered enriched for binding in modern peak-calling procedures. For example, the median immunoprecipitation/input ratio for our human RNA Pol II experiments is 20-fold, and only 0.033% of human RNA Pol II peaks contain an immunoprecipitation/input ratio ≤ twofold.
The mean is low, but in both anti-GFP experiments, there are loci with 3-5x enrichment (figure 4D). Most importantly, while the anti-GFP enrichment at the hyper-ChIPable loci is low, please note that the level of enrichment is variable from protein to protein (2-5X for Sir proteins, but often >10X for Cse4).
3. Thus, it is essential to note that the term ‘hyper-ChIPable’, coined by ref. 41, is quite misleading, as a correctly performed ChIP experiment will evaluate statistically enriched regions, with higher immunoprecipitation/input ratios. The so-called hyper-ChIPable regions in ref. 41 are not binding regions as determined under ChIP-seq best practices. Hence, when statistical peak-calling was performed in ref. 41 (using the established MACS peak-caller) to evaluate signals only at significantly enriched regions (Supplementary Table 1) only 17 (<7.5%) of the 238 claimed ‘hyper-ChIPable’ regions were called significant by all three Sir proteins. In fact, 68% of their 238 regions do not contain a binding site for any Sir protein as determined by MACS, despite even very liberal settings used (P < 10−5, no fold enrichment cut-off). Thus, the data of ref. 41 contradict its own major claim that all three Sir proteins showed enrichment at the 238 sites.
By reporting the 238 sites with >2fold enrichment of Sir2, Sir3, and Sir4, we are in fact being extra-demanding in terms of the threshold. We are stringently requiring all three proteins to be enriched above a threshold at the locus. So a target with 5x enrichment of Sir2 and 1.8X enrichment of Sir3 would not pass this cutoff. A typical ChIP study will focus on a single factor at a time. Had we done that, we would have many more artifactual targets for each silencing protein, with many at 5x or higher enrichment. Furthermore, the level of the artifactual signal varies from protein to protein or experiment to experiment. For example, the Cse4 signal at highly-expressed loci can give 10x or higher enrichment.
4. Furthermore, as indicated in Supplementary Table 3 of ref. 41, the Sir2, Sir3 and Sir4 ChIP-seq experiments were performed only once each, which raises the question as to whether enrichment of Sir proteins at the 238 sites is reproducible. More rigorously, even for the remaining 17 genomic loci, their status as hyper-ChIPable is questionable as each region would first have to be established as a reproducible binding site in replicate experiments for each individual Sir protein. If you consider that Sir2, Sir3 and Sir4 ChIP-seq constitutes three replicates of Sir proteins, their data show that most of their claimed sites were not reproducibly enriched.
Most of our artifact-cause analysis focuses on genome-wide data, not on the 238 sites. The 238 Sir-enriched euchromatic loci were a launching point for the analysis, but most of the paper looks comprehensively at the link between expression and ChIP levels. Figures 3, 4, and 5 are all on genome-wide correlations between Pol II/III and ChIP.
As for reproducibility, we see the same peaks, with often 10x enrichment, in Ste12, Cse4, two distinct GFP experiments, and each of the three Sir ChIP-Seq datasets. The same exact loci come up in the Sir3 paper from Oliver Rando’s group (Radman-Livaja M, 2011).
5. In addition to the analytical differences outlined above, other potential sources for the marked differences between our data and the Sir-enriched regions of ref. 41 are deviations from a typical ChIP protocol. In particular, ref. 41 employed a significantly longer cross-link time (1 h as opposed to the typical 10–20 min). This might contribute to formation of large non-specific protein–DNA complexes, which can in turn increase non-specific immunoprecipitation.
Though not discussed in the manuscript, we have in fact performed experiments to investigate if the crosslinking concentration contributed to the misleading signal. We performed ChIP with the 1 hour crosslinking at room temperature at the following formaldehyde concentrations: 0.0625%, .125%, .25%, .5% and 1%, but did not find a proportionate decrease in the hyper ChIPpable signal with the decreasing formaldehyde concentrations. Moreover, the presence of hyper-ChIPability in the Snyder datasets (Cse4, Ste12), ours (Sir2, 3, 4, GFP), and Rando (Sir3) make it clear that the problem is not in some unusual protocol steps in our hands.
We also note that we initially performed the Sir ChIP-Seq experiments because of our interest in the Sir protein biology. Because the Sir proteins do not directly interact with the DNA, we used longer crosslinking times. This is not unique to our work.
In summary, much more work is needed to pinpoint the cause of the artifact and to evaluate whether some or all of the signal at highly expressed genes in many other reported ChIP studies could be artifactual. Much more work is necessary to develop the best controls and corrections for the artifact. However, the artifact we report is not minor and is not a consequence of the methodological details of our manuscript.
Also, please note the following papers, published almost in parallel with ours, on this topic:
Park D, 2013 "Widespread Misinterpretable ChIP-seq Bias in Yeast" (Different analysis methods but the same conclusions in S. cerevisiae, analyzing an entirely different set of factors with ChIP-Seq experiments.)
Kasinathan S, 2014 "High-resolution mapping of transcription factor binding sites on native chromatin" (Questions specificity of standard ChIP in S. cerevisiae and at HOT regions of Drosophila. This work possibly provides a solution to the artifact with a modification of the ChIP technique.)
Also, the following discussion of our work on PubPeer may be useful.
- Inferences of clinical diagnostic reasoning and diagnostic error.
J Biomed Inform. 2011.1 comment
Anders von Heijne2014 Oct 06 3:40 p.m. (2 days ago) 3 of 3 people found this helpful
Wonderful!. This is a must read for all diagnosticians. But don´t miss Stanley DE, Campos DG. The logic of medical diagnosis. Perspect Biol Med. 2013 Spring;56(2):300-15 that takes the argument one step further.
- The logic of medical diagnosis.
Perspect Biol Med. 2013.1 comment
Anders von Heijne2014 Oct 06 3:37 p.m. (2 days ago) 1 of 1 people found this helpful
Yes, the abductive mode of thinking that Peirce described is really very crucial in all forms of diagnostics. Depending on the problem at hand we clearly use different logical strategies. This is a must read, together with Lawson AE1, Daniel ES. Inferences of clinical diagnostic reasoning and diagnostic error.J Biomed Inform. 2011 Jun;44(3):402-12. Sadly this sort of analysis seldom appears in clinical speciality journals. If you are lucky you can find articles on Dual System theory and Bayesian theory, but without abduction there is no place to start using these techniques. We all need a dose of metacognitive musings from time to time.
- The Genotype-Tissue Expression (GTEx) project.
Nat Genet. 2013.2 comments
In reply to a comment by Raha Pazoki2014 Sep 20 11:46 a.m.
Raha Pazoki2014 Oct 06 07:51 a.m. (2 days ago)
Addendum to previous comment "Tissue-specific online eQTL databases"
In the previous analysis, I compared heart-specific eQTLs from the GTEx consortium with eQTLs from the study by Koopman and co-workers. In that analysis, I considered SNP-gene pairs that appear with exactly the same names in the 2 databases. However, SNPs in high linkage disequilibrium (LD) and genes with various annotations may exist between these databases. It would be interesting to put a little more effort and search for such SNPs and genes to come-up with additional SNPs that consistently change cardiac expression of specific genes.
Raha Pazoki (twitter:@rahap)
- Wikipedia vs peer-reviewed medical literature for information about the 10 most costly medical conditions.
J Am Osteopath Assoc. 2014.2 commentsLane Rasberry also commented
Paul Vaucher2014 Oct 06 03:45 a.m. (2 days ago)edited
False conclusions drawn about the level of evidence of factual statements drawn from Wikipedia
Paul Vaucher, PhD, DiO1, Jean Gabriel Jeannot, MD2, Reto Auer, MD, MAS2
1 University Center of Legal Medicine Lausanne-Geneva, University Hospital of Lausanne (CHUV), Lausanne, Switzerland
2 Department of Ambulatory Care and Community Medicine, University of Lausanne, Lausanne, Switzerland
We believe Hastly et al's study (Hasty RT, 2014) to be misleading and that their paper should not be made available to the scientific community without serious revision. Apparently, it has passed unnoticed that the methodology and statistical analysis they used had little to do with their stated hypothesis and that their interpretation of the statistics was erroneous. Authors concluded Wikipedia to be an unreliable source of health information when an alternate interpretation of the presented results would in contrary point towards considering Wikipedia as a trusted source of information, provided rigorous reanalysis and reinterpretation. There conclusions were therefore in contradiction with other studies on the subject (Archambault PM, 2013, Kräenbring J, 2014) that tend to show that for topics for which health workers contribute, such as for drugs, Wikipedia’s information as trustworthy as those from textbooks. These discrepancies can be explained by major statistical and methodological errors in Hastly et al’s publication.
The correct interpretation of the McNemar statistic suggests that the concordance for diabetes and back pain are significantly better than for concussion, and not the reverse, as stated by the authors. Using data provided in Table 3, for factual statements identified by both reviewers (two first columns of Table 3), for diabetes mellitus and back pain, the authors found that up to 94% of assertions on Wikipedia were verified (respectively 72 out of 75 statement and 63 out of 67, p<0.001 for NcNemar statistic). In contrast, for concussion, only 65% were verified (66 out of 98, p=0.888 for NcNemar statistic ). The McNemar statistic tests whether the proportion of factual statements from both reviewers, classified as concordant or discordant by the authors, are above what would be expected by chance alone (i.e. 50%). The interpretation for diabetes mellitus, if McNemar’s test should be used at all, is that we would fail to reject the null hypothesis of a proportion of concordance of 50% in < 1% of the cases. McNemar statistic thus suggests that the concordance for diabetes and back pain are significantly better than for concussion, and not the reverse, as stated by the authors. The calculation is based on the number of discordant results on the diagonals (i.e. 34 vs 1 for diabetes mellitus). It does by no means test the discordance between Wikipedia statements and existing guidelines. To test their hypothesis, it would appear more relevant to simply report the pooled average proportion of correct statements with their confidence intervals. On a technical note, McNemar statistic should not be used when there are different numbers of assertions assessed between reviewers. McNemar is also known to be highly dependent on the number of factual statements within each article. Article with higher number of statements would reach the level of significance with higher proportion of ungrounded factual statements.
Second, the article falsely leads readers to believe that all factual statements (assertions) from 10 Wikipedia articles were identified and independently assessed by two internist to see whether they were concordant or not. Using two reviewers is a recognized method for increasing precision of a measure. However, the authors did not to provide statistics allowing readers to assess the between-reviewers variability in the identification of assertions. Table 3 suggests that over a third of assertions were reported by only one of the two reviewers. Authors did not find a method to then resolve these dissimilarities (to use their definition in Table 2) and clearly define which assertions were factual statements and which were not. They then did not to define a method of agreement to define which assertions were supported by evidence and which were not. Analysed results therefore only tend to show that for certain topics, internists have difficulties in detecting factual statements from Wikipedia and knowing whether they are grounded or not.
Hastly et al’s findings suggest that while there might be some discrepancies in the quality of articles between topics, some appear of very high quality, such as diabetes mellitus and back pain. Given the unnoticed errors included in their article and the importance on the interpretation of the results, Hastly et al.’s published article reveals that peer reviewed misleading information can also be made available to the public.
Conflicts of interest : Reto Auer and Jean-Gabriel Jeannot are advocates of the use of Wikipedia as a communication mean to inform the population on health issues. Paul Vaucher is an important contributor to the French Wikipedia page dedicated to osteopathic medicine.
- An overview of FDA-approved new molecular entities: 1827-2013.
Drug Discov Today. 2014.1 comment
Christopher Southan2014 Oct 05 03:59 a.m. (3 days ago)edited 1 of 1 people found this helpful
A good overview, but would be so much more useful if the authors could list the 1,453 (figshare?). Even better if they surfaced a resolution of these to molecular entites via PubChem CIDs. Note they could also submit not only this set but the previous seven therapeutic splits and get more traffic for the set of papers.
- Goal-Directed Resuscitation for Patients with Early Septic Shock.
N Engl J Med. 2014.1 comment
Ryan Radecki2014 Oct 04 2:48 p.m. (4 days ago) 1 of 1 people found this helpful
Post-publication commentary:
"ARISE, and Cast Off the Shackles of EGDT"
The sound you hear is a sigh of relief from Emergency Physicians and intensivists regarding the outcomes of the Australasian Resuscitation in Sepsis Evaluation (ARISE).
As ProCESS suggested, and as many have suspected all along, it seemed the critical intervention from Early Goal-Directed Therapy was the early part – and less the SCO2 monitoring and active management of physiologic parameters using dobutamine and blood transfusion. Now, we have a second study, in addition to ProCESS, supporting the same general conclusions....
http://www.emlitofnote.com/2014/10/arise-and-cast-off-shackles-of-egdt.html
- Clinical practice. Care of the asplenic patient.
N Engl J Med. 2014.1 comment
Quentin Hill2014 Oct 04 07:32 a.m. (4 days ago)edited 3 of 3 people found this helpful
In the section on guidelines, the authors have stated that the current British Committee for Standards in Haematology guidelines Davies JM, 2011 recommend lifelong penicillin prophylaxis for all persons with surgical asplenia. Although this was true in previous guidelines Working Party of the British Committee for Standards in Haematology Clinical Haematology Task Force, 1996, Davies et al recommend a risk adapted approach to antibiotic prophylaxis. They recommend antibiotic prophylaxis "at least" in the immediate post-operative period following splenectomy for trauma. Otherwise, long term prophylaxis is recommended for those considered to be high risk of pneumococcal infection: age <16 years or >50 years, inadequate serological response to pneumococcal vaccination, previous invasive pneumococcal disease and splenectomy for underlying haematological malignancy.
Additionally, the authors do not discuss the meningitis serogroup B vaccine (MenB). A MenB vaccine was licensed by the European Medicines Agency in January 2013. Although not yet FDA licensed, it has been used to control several outbreaks in the United States http://www.cdc.gov/meningococcal/outbreaks/vaccine-serogroupb.html. In England, over 80% of Neisseria meningitidis cases are due to capsular group B strains and it has been recommended in England for children and adults with asplenia, splenic dysfunction and complement disorders https://www.gov.uk/government/collections/immunisation-against-infectious-disease-the-green-book.
- A bicistronic MAVS transcript highlights a class of truncated variants in antiviral immunity.
Cell. 2014.2 comments
In reply to a comment by Pavel Baranov2014 Aug 06 12:16 p.m.
Jonathan C Kagan2014 Oct 03 11:54 a.m. (5 days ago)edited 2 of 2 people found this helpful
In response to the comment by Baranov, we would first like to thank you for your interest in our work. We agree that alternatively translated eukaryotic proteins most commonly share a reading frame, as our study mentioned. In fact, we discuss this point in the analysis of our ribosomal profiling data. For instance, we observed that truncations are more common than internal out-of-frame translation products. Regardless of whether the products share a reading frame, however, a point of interest is the regulation allowing ribosomes to initiate translation at more than one location on a transcript. We chose to use the terms polycistronic and bicistronic for two reasons. First, bicistronic mRNAs are operationally defined as transcripts that produce two stable proteins of distinct functions, regardless of whether the two proteins share sequence similarity. Our study clearly demonstrated that this is the case with the MAVS transcript. Whether these functionally distinct proteins share coding sequence with each other is functionally irrelevant. Second, a transcript producing a truncated protein (such as MAVS) conforms to generally accepted definitions of a polycistronic transcript, such as you provided: “…protein products of distinct coding ORFs are translated from the same transcript.” While the two MAVS proteins are encoded in the same reading frame, their production is initiated at unique start sites; thus, they have distinct coding sequences. In short, we consider them to be distinct ORFs that share a reading frame and therefore produce two proteins from a bicistronic mRNA.
- Aβ promotes VDAC1 channel dephosphorylation in neuronal lipid rafts. Relevance to the mechanisms of neurotoxicity in Alzheimer's disease.
Neuroscience. 2014.1 comment
Friedrich Thinnes2014 Oct 03 05:17 a.m. (5 days ago)
To finalize three-dimensional VDAC structure: Focus on Native VDAC will be indispensable
This study, from my point of view, marks a great moment of VDAC research:
1) It points another time to the relevance of cell membrane-standing VDAC-1 for the pathogenesis of Alzheimer´s Disease via apoptosis.
2) It gives strong support to the cell membrane-expression, more precisely plasmalemmal lipid raft-integration of vertebrate VDAC-1. Furthermore, the data concerning the posphorylation of VDAC-1 in correlation to regulation of channel opening/closing argue in favor of its involvement in cell volume regulation and thus apoptosis. They are in line with much evidence indicating that plasmalemmal VDAC-1 forms the channel part of a volume regulated anion channel complex (VRAC/VSOAC).
3) From here, VDAC-1 in the plasmalemma must be “fully closed” = collapsed = N-terminus accessible outside the barrel = closed for anions and cations, and there is evidence that the N-terminal part of native VDAC-1 can be reached by antibodies even in detergent solutions (Benz et al., 1992; Thinnes and Burckhardt, 2012).
4) Applying canonical incorporation into black membranes, detergent-solubilized native phosphorylated mammalian VDAC-1 as well as recombinant channel preparations from E. coli inclusion bodies show only “open” = anion-selective = N-terminal stretch inside the barrel and “closed” = cation-selective = semi-collapsed = N-terminal stretch inside the barrel channel phenotypes (Teijido et al., 2012). “Fully closed” = collapsed = N-terminus accessible outside the barrel VDAC-1 states thus cannot be studied by this approach. The same holds true for more recent crystallization-based approaches. However, upcoming laser-based approaches may work just on native VDAC-1 in solutions; thus improvements to get detergent solubilized native VDAC preparation may pay to keep on the schedule.
--Benz R, Maier E, Thinnes FP, Götz H, Hilschmann N (1992) Studies on human porin. VII. The channel properties of the human B-lymphocyte membrane-derived “Porin 31HL” are similar to those of mitochondrial porins, Biol. Chem. Hoppe Seyler. 373: 295–303. --Thinnes FP, Burckhardt G (2012) On a fully closed state of native human type-1 VDAC enriched in Nonidet P40. Mol Genet Metab 107: 632-633. --Teijido O, Ujwal R, Hillerdal CO, Kullman L, Rostovtseva TK, Abramson J (2012) Affixing N-terminal α-helix to the wall of the voltage-dependent anion channel does not prevent its voltage gating. J Biol Chem 287: 11437-11445.
- An Epistemological Perspective on the Value of Gain-of-Function Experiments Involving Pathogens with Pandemic Potential.
MBio. 2014.3 comments
In reply to a comment by Joshua L Cherry2014 Sep 23 5:56 p.m.
Joshua L Cherry2014 Oct 02 12:06 p.m. (6 days ago) 2 of 2 people found this helpful
I thank the author(s) for responding. My original comment applies to much of the response, but a few points should be clarified.
We have all encountered writing that refers to science or its subfields and uses scientific language but lacks scientific content. The editorial does something similar with respect to philosophy. It defines epistemology and makes use of its vocabulary, but the arguments that it makes are not epistemological. The only general principle about knowledge that it invokes is the Faberite notion that knowledge is good. This is not an epistemological proposition or a philosophical insight, nor a revelation to scientists.
The benefits of GOF experiments commonly discussed are indeed benefits of the resulting knowledge. (What else would they be? The economic benefits of creating employment for laboratory personnel?) These have mostly been supposed direct practical applications of this knowledge for preventing human H5N1 infections. The response above suggests that the authors intended to emphasize “the many other possible future uses of that knowledge, some of them practical, but some of them purely theoretical”. As I stated, most or all scientific knowledge has many possible future uses, and it is common knowledge, not a philosophical insight, that science works this way.
Exceptional risks demand exceptional benefits. Satisfaction of normative standards does not imply exceptional benefits. The authors indeed make a logical leap in concluding that GOF experiments must be powerful because they share certain formal properties with experiments that proved to be powerful. The literature is filled with results of experiments that meet these normative standards, with importance ranging from great to nearly nil. Thus, we must consider the value of these particular experiments, using scientific reasoning and judgment. That is exactly what most of the debate has been concerned with.
- Less is more: selective advantages can explain the prevalent loss of biosynthetic genes in bacteria.
Evolution. 2014.1 comment
Morgan Price2014 Oct 01 10:43 p.m. (6 days ago) 1 of 1 people found this helpful
The authors predict that 36% of bacteria cannot synthesize phenylalanine, but many (most?) of these predictions are erroneous. I looked over the predictions, which were graciously provided by Dr. Kost, and noticed the following errors. First, many cyanobacteria are predicted auxotrophs (i.e., Synechococcus elongatus PCC 6301, and members of the genera Prochlorococcus, Anabaena, Cyanothece, Nostoc, and Gloeobacter) . However cyanobacteria are normally grown in a mineral medium with no organic carbon. Second, our group studies the sulfate-reducing bacteria Desulfovibrio alaskensis G20 and D. vulgaris Miyazaki F, which grow in a defined minimal medium without any amino acids, yet these are predicted to be phenylalanine auxotrophs. Third, Caulobacter crescentus CB15 is predicted to be a phenylalanine auxotroph, but it can grow at very low nutrient levels, including in defined mineral media with a small amount of sugar added.
- Instruments for assessing risk of bias and other methodological criteria of animal studies: omission of well-established methods.
Environ Health Perspect. 2014.1 comment
Anthony Tweedale2014 Oct 01 2:04 p.m. (7 days ago)edited 1 of 1 people found this helpful
PubMed soon should be adding the published Erratum to this Letter to the Editor (responding to academics' research important to risk assessment (Krauth D, 2013), but as I prompted the Erratum, I wish to add that it, concerning the declared interests (DoI) of two of these industry and industry-affiliated authors (Pr.'s Leist & Boobis), even now fails to say 'industry' or a synonym; whereas my investigations revealed that these two also work regularly for industries.

Top comments now - more about this
Arnedo J.Am J Psychiatry. 2014.16 commentsGerome Breen, C. Robert Cloninger and 1 other also commented
Igor Zwir2014 Oct 07 6:16 p.m. (yesterday) 7 of 7 people found this helpful
Interdisciplinary work not so fast … Geneticist with a statistic background vs. engineers/computer scientists with a genetic background
Unfortunately, interdisciplinary works have been concurrently stimulated and frozen at the same time. Much of this contradiction is due to the lack of universal reviewers that know about everything. Science is specialized, in terms of education and consequently in terms of results. For example, most studies of complex disorders usually focus on single sources of data (genetics, neuroimages), eliminating the possibility of having complementary perspectives of the patients. In addition, there is a sort of disrupted communication between the biomedical researchers and math or computer science investigators. Biologist and physicians use to search for articles in pubmed (http://www.ncbi.nlm.nih.gov/pubmed), however, most of the engineers look for the Institute of Electrical and Electronics Engineers (IEEE) communications (https://www.ieee.org) and many of them ignore Pubmed. In contrast, many molecular biologists ignore what IEEE exists and what it represents. There are few IEEE publications that are available in pubmed and most of them are old. However, many of those that are not in Pubmed are much more rigorous than any method proposed in the best biomedical journals. This comment encourages both Pubmed and IEEE sites to solve their differences, which in turn, will help to have more and more informed reviewers for novel techniques in the post genomic era.
Zhou Y.BioData Min. 2014.2 comments
Elizabeth Moylan2014 Oct 08 07:25 a.m. (13 hours ago) 3 of 3 people found this helpful
Thank you for raising this, we are investigating in accordance with COPE guidelines (http://publicationethics.org/).
Elizabeth Moylan, Biology Editor, BioMed Central.
Neil Saunders2014 Oct 07 10:24 p.m. (22 hours ago)edited 4 of 4 people found this helpful
A substantial fraction of the text in this article has been copied verbatim from an earlier article which it cites. This can be demonstrated by entering the article URLs - http://genomebiology.com/2010/11/3/r25 and http://www.biodatamining.org/content/7/1/15 - into this online tool. Also, the first 10 or so references are identical and in the same order.
Discussion at this Twitter thread includes images which make the similarity very apparent.