Contents
- Description
- Gene Selection Options
- Data Filtering Options
- Gene Filtering Options
- Viewing Clustering Results
- Clustering and Image Generation Options
- Browsing, Viewing, and Downloading Clustered Data
Related Help Documents
- Data Selection: Explanation of the program used to select hybridizations (arrays) for viewing or analyzing data
- Analysis Methods: Information about the algorithms used for hierarchical clustering and Self-Organizing Maps (SOMs)
- File Formats: Information about preclustering (.pcl), clustered data table (.cdt), gene tree (.gtr) and array tree (.atr) files generated in the process of clustering data
Data Selection for Analysis is split into three large steps:
This section allows you to first specify which genes are of interest to you, then decide how to collapse your data, how to identify genes in your output file, select biological annotation and to choose a way to label the arrays you're using.
If you don't want your filters joined by "AND"s, use the FilterString box to enter the method by which you want your
filters joined. If you do not enter a filter string, the default is
that all active filters will be connected with the AND operator.
You may enter a string that dictates how you want the
filters combined. For instance, the filter string:
1 AND (2 OR 3)
means that you want datapoints that pass filter 1 and either
filter 2 OR filter 3. (Note: filters 1, 2, and 3 must all
be active for this to work.)
You may also use more complex queries, such as:
(1 AND ((2 OR 3) AND (4 OR 5))) OR 6
The filtering will abort with an error message if the parentheses
don't match or if the string is not
syntactically correct.
You can use the Rank filter to select only those genes whose retrieved values are in the top Nth percentile. You can decide what the percentile must be and the number of arrays for which a gene must be in your percentile. If you elect to show the percentiles in your preclustering file (for more information, see the File Format Help page), you will be unable to cluster your data with our tools.
You can use the Deviations filters to select only those genes with a retrieved value different from the mean (for a single array) by more than a selected multiple of the standard deviation (for that array). You can decide what that multiple is and over how many arrays it must be true.
When you choose to center both by gene and by array, you can decide whether or not to iterate the operation. Upon centering arrays, values for centered genes may be thrown off, because of missing values, or when centering by medians. Iterating allows the centering to be repeated on both genes and arrays until the values stop changing. Obviously, iterating will increase the time spent calculating your results. Iteration continues until the maximum change to any array is less than 0.01 (in units of log-ratio), up to a maximum of ten iterations.
If you choose to zero-transform your data, you must indicate one or more arrays that represent the zero-time point, and a method for averaging their values (mean or median) if you select more than one.
If you are retrieving log-transformed ratio data, you can also select only those genes whose distance in result-space exceeds a given value. The log transformed data for a given gene across the selected experiments constitute a vector, and this filter determines whether the length of this vector is greater than the specified minimum.
You have the define the following options when hierarchically clustering
The centered vs non-centered metric only applies if you are using the Pearson Correlation (see below). It will not make a difference if using the Euclidean distance.
The same considerations apply for experiments as described for genes above.
These are distance metrics that are used for measuring the similarity of expression between genes.
If you choose 'Self Organizing Map Cluster', be sure to specify x and y dimensions. Your settings for hierarchical clustering described above will still be used when each partition of the SOM is clustered.