These pages are not displaying properly because the Compatibility View in your Internet Explorer is enabled. We suggest that you remove 'fu-berlin.de' from your list of sites that have Compatibility View enabled.
- In Internet Explorer, press the 'Alt' key to display the Menu bar, or press and hold the address bar and select 'Menu bar'.
- Click 'Tools' and select 'Compatibility View settings'.
- Select 'fu-berlin.de' under 'Websites you've added to Compatibility View'.
- Click 'Remove'.
Typical workflow for seeking for mutations in patient DNA (Genomics)
In practical courses and work you very likely end up working with some kind of sequencing data. Be it in transcriptomics, sequence analysis, building networks of gene interactions, finding genes associated with diseases or one of many other topics, everything starts with genomics. At the moment the most popular sequencing method for RNA expression is RNA-seq; when looking for interactions of DNA with DNA or with proteins the usual approach is by using immunoprecipitation followed by sequencing (e.g. ChIP-seq). In this exercise we want to find genomic variations in people with a certain disease compared to healthy people.
Sort the following steps of the workflow into the right order.
Raw Data Generation
Variant Calling
Whole Genome Mapping
Variant Annotation
Raw Data Analysis
Usually wet-lab biologists generate the raw data but we should still know how it was done. In the raw data analysis we check the quality and can make some general statistics. Next we have to map our reads to the genome and compare the genomic regions of disease samples with a healthy reference to identify possible causes for diseases. Finally we can annotate them - do they overlap an already known gene, or regulatory region, what is the function of that region? Thus, the right order of working steps is as follows:
- Raw Data Generation
- Raw Data Analysis
- Whole Genome Mapping
- Variant Calling
- Variant Annotation
When working with sequencing data we should know what it can be used for, what biological components are used in the experiments, how the experiments are conducted and thus what mistakes, artefacts or biases can occur. Raw data should always be checked for quality and an appropriate normalization performed. If possible pool together or compare between replicates.
Make yourself acquainted with already existing programs for processing and evaluating data as well as the file formats that have been developed along with them. Two collections of programs that are worth knowing are samtools and bedtools.
Associate the following input types with each of the processes.
VCF files
Intensity files
Fastq files
Prepared samples
SAM files
You get feedback for each answer by clicking on the button.