I've calculated the Q-value for enirchment in GO categories (using Storey's method), and plotted the enrichment at each attachment point. Additionally, each of these pages plot the LAR scores for a GO category, of just the genes that are attached. Clicking on a plot leads to the full LAR scores for every E-gene in the category, not just the attached E-genes.


Predicted Network Interactions. A. Inferred S-gene network and Frontier. Nodes represent S-genes (ovals), E-genes (gray boxes), and Gene Ontology categories (white boxes). Arrows indicate activation, and tees indicate repression. Mixed arrow/tee line endings indicate GO set enrichment among both activated and inhibited E-genes. B. Expression values of selected E-genes. Each row shows the log-ratio expression of a single E-gene under various shRNA knockdowns to a GFP shRNA knockdown control. C. S-gene interaction confidence. Each pixel in the heatmap corresponds to an S-gene interaction’s bootstrap confidence. For each interaction, the parent S-gene is labeled to the rich, and the child S-gene is labeled to the bottom. Note that though NEM include all transitive interaction, they are not displayed in (B) for simplicity. Therefore, a row shows bootstrap confidence of an S-gene being upstream of other genes, and a column shows bootstrap confidence of a gene being downstream of other genes.


I collect all frontier genes connected to an attachment point (i.e. negative attached to SCN5A? ), calculate the intersections with all GO categories, then calculate a p-value using the hypergeometric distribution (no multiple-testing correction).


There are two methods of gathering a set of frontier genes for a connection point:


  • Multiple attachment - a frontier gene is in an attachment set if it's likelihood is greater than than unattached
  • Single attachment - a frontier gene is only in it's most likely attachment set, and is is more likely attached than unattached


There were two ways that the GFP controls clustered: either independently of the knockdowns (for tier1 and tier3a), or by replicate set (tier 2 and tier3). An independent GFP cluster suggests that we should subtract out the mean GFP levels from all replicates (MeanGFPControl? ). GFP replicates being mixed into each replicate set's cluster indicates that we should have a different GFP control for each replicate set (ReplicateSetGFPControl? ).


Also, SCN5A? is not yet being treated correctly. These expression log-ratios are very close to zero, compared to other arrays, so I should probably estimate different differential expression parameters. See the MeanGFPControl boxplot.


