Contents
Input File Format Specification
File Format
- Accepted formats: CSV, TSV, or TXT
- Maximum file size: 10 MB
- Must be tabular data with column headers
Required Columns
Your data file must contain at least three columns (column names don't matter — the tool auto-detects them):
- Gene Identifiers: HGNC symbols (e.g., TP53), Ensembl IDs (e.g., ENSG00000141510), or Entrez IDs
- Fold Change: log₂ fold change values from differential expression analysis
- P-values: Statistical significance values (raw or adjusted p-values)
Example Data Format
Data Preprocessing Tips
- Remove duplicate gene entries or aggregate them before upload
- Ensure p-values are between 0 and 1
- log₂FC values can be any real number (typically between -10 and +10)
- Missing values (NA, NaN) in gene ID column will cause those rows to be skipped
Statistical Methods
Fisher's Exact Test
The enrichment analysis uses Fisher's exact test to determine if a Key Event gene set is over-represented among your significant genes. For each KE, a 2×2 contingency table is constructed:
Fisher's exact test calculates the probability of observing this distribution (or more extreme) under the null hypothesis that genes are randomly distributed.
False Discovery Rate (FDR) Correction
Because multiple KEs are tested simultaneously, we apply Benjamini-Hochberg FDR correction to control the false discovery rate. This adjusts p-values to account for multiple comparisons, reducing false positives.
Odds Ratio
The odds ratio quantifies the strength of association between KE membership and significance:
- OR = 1: No association
- OR > 1: Positive association (enrichment)
- OR < 1: Negative association (depletion)
For example, OR = 3.5 means genes in this KE are 3.5 times more likely to be significant than expected by chance.
Background Universe
The enrichment analysis uses all genes in your uploaded dataset as the background universe. This ensures the statistical test accounts for which genes were actually measured in your experiment.
Interpreting Results
Volcano Plot
The volcano plot visualizes the magnitude (log₂FC) and significance (-log₁₀ p-value) of gene expression changes:
- Red points: Significantly upregulated genes (above FC threshold, p < 0.05)
- Blue points: Significantly downregulated genes (below -FC threshold, p < 0.05)
- Green points: Statistically significant but below FC threshold
- Gray points: Not statistically significant
Enrichment Table
The enrichment results table shows which Key Events are over-represented in your significant genes:
- Key Event Title: Name of the biological process or event
- # Overlap: Number of your significant genes associated with this KE
- % Enrichment: Percentage of KE genes that are significant in your dataset
- P-value: Statistical significance from Fisher's exact test
- FDR: False Discovery Rate (adjusted p-value) using Benjamini-Hochberg correction
- Odds Ratio: Magnitude of enrichment (>1 indicates over-representation)
AOP Network Visualization
The interactive network shows how Key Events connect within the selected AOP:
- Node colors:
- Light green = Molecular Initiating Event (MIE)
- Light orange = Intermediate Key Event
- Light red = Adverse Outcome (AO)
- Node borders:
- Red border = Significantly enriched KE (FDR < 0.05)
- Green border = Significantly affected gene
- Gene nodes: Colored by expression (blue = downregulated, red = upregulated)
- Edges: Gray lines show KE-KE relationships; thin gray lines show KE-gene associations
Network Controls
- + Add Gene Nodes: Display genes associated with each KE
- Toggle Gene Visibility: Show/hide gene nodes
- Reset View: Return to original layout and zoom
- Download PNG: Export network visualization
- Download Network: Export Cytoscape JSON file for further analysis
Batch Analysis Tutorial
Batch analysis lets you analyse multiple gene expression datasets in a single session, then compare enrichment results across conditions. This is useful for dose–response or time-course experiments.
Step 1: Upload Files
Click the Batch Analysis tab on the home page. You can add files in two ways:
- Upload your own: Drag and drop up to 10 CSV/TSV/TXT files onto the drop zone, or click to browse
- Use demo datasets: Expand the "Select Cisplatin Demo Datasets" panel and tick the files you want to include
Each uploaded file shows a preview of its first few rows so you can verify the data looks correct.
Step 2: Tag Conditions
Assign metadata to each file so results can be grouped and compared. For each file you can set:
- Condition label: A short name for the experimental condition (e.g., "10 uM", "24 hr")
- Timepoint: Exposure duration (e.g., "4hr", "24hr", "72hr")
- Dose: Concentration (e.g., "0.1uM", "50uM")
For cisplatin demo files, these fields are auto-filled from the filename.
Step 3: Analysis Settings
Configure shared settings that apply to all files:
- AOP selection: Search for an AOP by name or ID using the typeahead search
- Gene ID column / FC column / P-value column: Select which columns to use (applied to all files)
- log₂FC threshold: Minimum fold change for significance
- P-value cutoff: Maximum p-value for significance (default 0.05)
- Experiment metadata: Dataset ID, stressor name, owner, and description for reports
Running the Analysis
Click Run Batch Analysis to start. A progress modal shows the status of each file as it is processed. Once complete, you are taken to the batch summary page where you can view individual results or proceed to the comparison view.
Comparison Feature Guide
After completing a batch analysis, use the comparison view to identify patterns across conditions.
Heatmap View
The heatmap displays KE enrichment significance (FDR values) across all analysed conditions. Rows represent Key Events and columns represent conditions. Cells are coloured by significance level:
- Darker colours indicate stronger enrichment (lower FDR)
- Hover over a cell to see the exact FDR value, overlap count, and odds ratio
- Rows and columns can be sorted to highlight patterns
Table View
The comparison table provides a detailed numeric view of enrichment results across conditions. Each row is a Key Event, and you can compare overlap counts, p-values, FDR, and odds ratios side by side.
Network Overlay
The network comparison overlays enrichment results from multiple conditions onto the same AOP network. KE nodes show aggregated significance across the selected conditions, making it easy to see which parts of the pathway are consistently affected.
Delta Mode
Delta mode highlights the differences between two selected conditions. It shows which Key Events become more or less enriched as conditions change (e.g., from low to high dose), helping identify dose–response transitions.
Frequently Asked Questions
What file formats are accepted?
CSV (comma-separated), TSV (tab-separated), and TXT files up to 10 MB. The file must contain column headers in the first row and at least three columns: gene identifiers, log₂ fold change, and p-values.
Which gene identifier types are supported?
The tool accepts:
- HGNC gene symbols (e.g., TP53, BRCA1, CYP3A4)
- Ensembl gene IDs (e.g., ENSG00000141510)
- Entrez gene IDs (e.g., 7157)
The identifier type is auto-detected. For best results, use a single identifier type consistently throughout your file.
How are duplicate genes handled?
If the same gene appears multiple times (e.g., from probe-level data or “///”-separated symbols), the tool combines them automatically:
- P-values: Combined using Fisher’s method
- log₂FC values: Averaged (arithmetic mean)
What is the background universe?
The enrichment analysis uses all genes in your uploaded dataset as the background, not the entire human genome. This makes the test more appropriate for platform-specific data (e.g., RNA-seq panels that measure a subset of genes).
How is FDR calculated?
False Discovery Rate is calculated using the Benjamini–Hochberg procedure. It adjusts p-values to account for multiple comparisons (one test per KE). An FDR < 0.05 means there is less than a 5% chance that a result at that significance level is a false positive.
How does batch harmonisation work?
In batch mode, all files are processed with the same analysis settings (AOP, thresholds, column mappings). Gene identifiers are normalised to the same format across files, and the same reference KE gene sets are used for all enrichment tests. This ensures results are directly comparable across conditions.
Can I export my results?
Yes, several export options are available:
- Reports: PDF or HTML reports with full analysis documentation
- Tables: Enrichment results as CSV or Excel files
- Network: Cytoscape JSON file for import into Cytoscape desktop
- Visualizations: Network as PNG image
Is my data stored on the server?
Uploaded files are stored temporarily during your session and cleaned up automatically. Experiment metadata and analysis parameters are saved in the database for reproducibility, but your raw gene expression data is not permanently stored.