Skip to content

Commit

Permalink
Updated broken links and fixed mkdocs build
Browse files Browse the repository at this point in the history
  • Loading branch information
Acribbs committed Jan 1, 2025
1 parent c891bff commit 9869f45
Show file tree
Hide file tree
Showing 6 changed files with 50 additions and 51 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/cgatcore_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ jobs:
- name: Build and Deploy MkDocs Site
run: |
mkdocs build --strict
mkdocs build
mkdocs gh-deploy --force --clean
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
26 changes: 13 additions & 13 deletions docs/getting_started/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,13 +228,13 @@ When running the pipeline, make sure to specify `--no-cluster` as a command line
- **Logs**: Check the log files generated during the pipeline run for detailed error messages.
- **Support**: For further assistance, refer to the [CGAT-core documentation](https://cgat-developers.github.io/cgat-core/) or raise an issue on our [GitHub repository](https://github.com/cgat-developers/cgat-core/issues).

## CGAT-core Examples
## CGAT-core Examples {#cgat-core-examples}

This guide provides practical examples of CGAT-core pipelines for various use cases, from basic file processing to complex genomics workflows.

## Quick Start Examples
## Quick Start Examples {#quick-start-examples}

### Hello World Pipeline
### Hello World Pipeline {#hello-world-pipeline}

```python
"""hello_world.py - Simple CGAT pipeline example
Expand Down Expand Up @@ -271,7 +271,7 @@ if __name__ == "__main__":
sys.exit(P.main(sys.argv))
```

### Configuration Example
### Configuration Example {#configuration-example}

```yaml
# pipeline.yml
Expand All @@ -287,9 +287,9 @@ cluster:
memory_default: 1G
```
## Real-World Examples
## Real-World Examples {#real-world-examples}
### 1. Genomics Pipeline
### 1. Genomics Pipeline {#genomics-pipeline}
This example demonstrates a typical RNA-seq analysis pipeline:
Expand Down Expand Up @@ -380,7 +380,7 @@ if __name__ == "__main__":
sys.exit(P.main(sys.argv))
```

### 2. Data Processing Pipeline
### 2. Data Processing Pipeline {#data-processing-pipeline}

Example of a data processing pipeline with S3 integration:

Expand Down Expand Up @@ -455,7 +455,7 @@ if __name__ == "__main__":
sys.exit(P.main(sys.argv))
```

### 3. Image Processing Pipeline
### 3. Image Processing Pipeline {#image-processing-pipeline}

Example of an image processing pipeline:

Expand Down Expand Up @@ -522,9 +522,9 @@ if __name__ == "__main__":
sys.exit(P.main(sys.argv))
```

## Best Practices
## Best Practices {#best-practices}

### 1. Resource Management
### 1. Resource Management {#resource-management}

```python
@transform("*.bam", suffix(".bam"), ".sorted.bam")
Expand All @@ -550,7 +550,7 @@ def sort_bam(infile, outfile):
P.run(statement)
```

### 2. Error Handling
### 2. Error Handling {#error-handling}

```python
@transform("*.txt", suffix(".txt"), ".processed")
Expand All @@ -571,7 +571,7 @@ def robust_processing(infile, outfile):
P.cleanup_tmpdir()
```

### 3. Configuration Management
### 3. Configuration Management {#configuration-management}

```yaml
# pipeline.yml - Example configuration
Expand Down Expand Up @@ -611,7 +611,7 @@ s3:
max_concurrency: 10
```
## Running the Examples
## Running the Examples {#running-the-examples}
1. **Setup Configuration**
```bash
Expand Down
18 changes: 9 additions & 9 deletions docs/getting_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The following sections describe how to install the `cgatcore` framework.

## Conda installation
## Conda installation {#conda-installation}

The preferred method of installation is using Conda. If you do not have Conda installed, you can install it using [Miniconda](https://conda.io/miniconda.html) or [Anaconda](https://www.anaconda.com/download/#macos).

Expand All @@ -12,21 +12,21 @@ The preferred method of installation is using Conda. If you do not have Conda in
conda install -c conda-forge -c bioconda cgatcore
```

### Prerequisites
### Prerequisites {#prerequisites}

Before installing `cgatcore`, ensure that you have the following prerequisites:

- **Operating System**: Linux or macOS
- **Python**: Version 3.6 or higher
- **Conda**: Recommended for dependency management

### Troubleshooting
### Troubleshooting {#troubleshooting}

- **Conda Issues**: If you encounter issues with Conda, ensure that the Bioconda and Conda-Forge channels are added and prioritized correctly.
- **Pip Dependencies**: When using pip, manually install any missing dependencies listed in the error messages.
- **Script Errors**: If the installation script fails, check the script's output for error messages and ensure all prerequisites are met.

### Verification
### Verification {#verification}

After installation, verify the installation by running:

Expand All @@ -41,15 +41,15 @@ print(cgatcore.__version__)

This should display the installed version of `cgatcore`.

## Pip installation
## Pip installation {#pip-installation}

We recommend installation through Conda because it manages dependencies automatically. However, `cgatcore` is generally lightweight and can also be installed using the `pip` package manager. Note that you may need to manually install other dependencies as needed:

```bash
pip install cgatcore
```

## Automated installation
## Automated installation {#automated-installation}

The preferred method to install `cgatcore` is using Conda. However, we have also created a Bash installation script, which uses [Conda](https://conda.io/docs/) under the hood.

Expand Down Expand Up @@ -78,7 +78,7 @@ conda activate cgat-c

The installation script will place everything under the specified location. The aim of the script is to provide a portable installation that does not interfere with existing software environments. As a result, you will have a dedicated Conda environment that can be activated as needed to work with `cgatcore`.

## Manual installation
## Manual installation {#manual-installation}

To obtain the latest code, check it out from the public Git repository and activate it:

Expand All @@ -94,7 +94,7 @@ To update to the latest version, simply pull the latest changes:
git pull
```

## Installing additional software
## Installing additional software {#installing-additional-software}

When building your own workflows, we recommend using Conda to install software into your environment where possible. This ensures compatibility and ease of installation.

Expand All @@ -105,7 +105,7 @@ conda search <package>
conda install <package>
```

## Accessing the libdrmaa shared library
## Accessing the libdrmaa shared library {#accessing-libdrmaa}

You may also need access to the `libdrmaa.so.1.0` C library, which can often be installed as part of the `libdrmaa-dev` package on most Unix systems. Once installed, you may need to specify the location of the DRMAA library if it is not in a default library path. Set the `DRMAA_LIBRARY_PATH` environment variable to point to the library location.

Expand Down
48 changes: 24 additions & 24 deletions docs/getting_started/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ The aim of this pipeline is to perform pseudoalignment using `kallisto`. The pip

The `cgat-showcase` pipeline highlights some of the functionality of `cgat-core`. Additionally, more advanced workflows for next-generation sequencing analysis are available in the [cgat-flow repository](https://github.com/cgat-developers/cgat-flow).

## Tutorial start
## Tutorial start {#tutorial-start}

### Step 1: Download the tutorial data
### Step 1: Download the tutorial data {#download-data}

Create a new directory, navigate to it, and download the test data:

Expand All @@ -21,7 +21,7 @@ wget https://www.cgat.org/downloads/public/showcase/showcase_test_data.tar.gz
tar -zxvf showcase_test_data.tar.gz
```

### Step 2: Generate a configuration YAML file
### Step 2: Generate a configuration YAML file {#generate-config}

Navigate to the test data directory and generate a configuration file for the pipeline:

Expand All @@ -38,7 +38,7 @@ python /path/to/file/pipeline_transdiffexpres.py config

This will generate a `pipeline.yml` file containing configuration parameters that can be used to modify the pipeline output. For this tutorial, you do not need to modify the parameters to run the pipeline. In the [Modify Config](#modify-config) section below, you will find details on how to adjust the config file to change the pipeline's output.

### Step 3: Run the pipeline
### Step 3: Run the pipeline {#run-pipeline}

To run the pipeline, execute the following command in the directory containing the `pipeline.yml` file:

Expand All @@ -56,17 +56,17 @@ cgatshowcase --help

This will start the pipeline execution. Monitor the output for any errors or warnings.

### Step 4: Review Results
### Step 4: Review Results {#review-results}

Once the pipeline completes, review the output files generated in the `showcase_test_data` directory. These files contain the results of the pseudoalignment.

### Troubleshooting
### Troubleshooting {#troubleshooting}

- **Common Issues**: If you encounter errors during execution, ensure that all dependencies are installed and paths are correctly set.
- **Logs**: Check the log files generated during the pipeline run for detailed error messages.
- **Support**: For further assistance, refer to the [CGAT-core documentation](https://cgat-core.readthedocs.io/en/latest/) or raise an issue on our [GitHub repository](https://github.com/cgat-developers/cgat-core/issues).

### Step 5: Generate a report
### Step 5: Generate a report {#generate-report}

The final step is to generate a report to display the output of the pipeline. We recommend using `MultiQC` for generating reports from commonly used bioinformatics tools (such as mappers and pseudoaligners) and `Rmarkdown` for generating custom reports.

Expand All @@ -78,17 +78,17 @@ cgatshowcase transdiffexprs make build_report -v 5 --no-cluster

This will generate a `MultiQC` report in the folder `MultiQC_report.dir/` and an `Rmarkdown` report in `R_report.dir/`.

## Core Concepts
## Core Concepts {#core-concepts}

### Pipeline Structure
### Pipeline Structure {#pipeline-structure}

A CGAT pipeline typically consists of:
1. **Tasks**: Individual processing steps
2. **Dependencies**: Relationships between tasks
3. **Configuration**: Pipeline settings
4. **Execution**: Running the pipeline

### Task Types
### Task Types {#task-types}

1. **@transform**: One-to-one file transformation
```python
Expand All @@ -111,7 +111,7 @@ def split_file(infile, outfiles):
pass
```

### Resource Management
### Resource Management {#resource-management}

Control resource allocation:
```python
Expand All @@ -126,7 +126,7 @@ def sort_bam(infile, outfile):
P.run(statement)
```

### Error Handling
### Error Handling {#error-handling}

Implement robust error handling:
```python
Expand All @@ -137,9 +137,9 @@ except P.PipelineError as e:
raise
```

## Advanced Topics
## Advanced Topics {#advanced-topics}

### 1. Pipeline Parameters
### 1. Pipeline Parameters {#pipeline-parameters}

Access configuration parameters:
```python
Expand All @@ -150,7 +150,7 @@ threads = PARAMS.get("threads", 1)
input_dir = PARAMS["input_dir"]
```

### 2. Logging
### 2. Logging {#logging}

Use the logging system:
```python
Expand All @@ -164,7 +164,7 @@ L.warning("Low memory condition")
L.error("Task failed: %s" % e)
```

### 3. Temporary Files
### 3. Temporary Files {#temporary-files}

Manage temporary files:
```python
Expand All @@ -180,9 +180,9 @@ def sort_bam(infile, outfile):
P.run(statement)
```

## Best Practices
## Best Practices {#best-practices}

### Code Organization
### Code Organization {#code-organization}

#### 1. Task Structure
- Use meaningful task names
Expand All @@ -202,7 +202,7 @@ def sort_bam(infile, outfile):
- Include usage examples
- Maintain a clear README

### Resource Management
### Resource Management {#resource-management-best-practices}

#### 1. Memory Usage
- Set appropriate memory limits
Expand Down Expand Up @@ -263,7 +263,7 @@ def sort_with_temp(infile, outfile):
P.cleanup_tmpdir()
```

### Error Handling
### Error Handling {#error-handling-best-practices}

#### 1. Task Failures
- Implement proper error checking
Expand Down Expand Up @@ -334,7 +334,7 @@ def process_with_logging(infile, outfile):
raise
```

### Pipeline Configuration
### Pipeline Configuration {#pipeline-configuration}

#### 1. Parameter Management
- Use configuration files
Expand Down Expand Up @@ -398,7 +398,7 @@ def test_pipeline():
assert check_output_validity("expected_output.txt")
```

### Troubleshooting
### Troubleshooting {#troubleshooting-best-practices}

If you encounter issues:

Expand All @@ -425,14 +425,14 @@ For more detailed information, see:
- [Cluster Configuration](../pipeline_modules/cluster.md)
- [Error Handling](../pipeline_modules/execution.md)

## Next Steps
## Next Steps {#next-steps}

- Review the [Examples](examples.md) section
- Learn about [Cluster Configuration](../pipeline_modules/cluster.md)
- Explore [Cloud Integration](../s3_integration/configuring_s3.md)

For more advanced topics, see the [Pipeline Modules](../pipeline_modules/overview.md) documentation.

## Conclusion
## Conclusion {#conclusion}

This completes the tutorial for running the `transdiffexprs` pipeline for `cgat-showcase`. We hope you find it as useful as we do for writing workflows in Python.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ By leveraging these modules and decorators, you can build powerful, scalable, an
## Quick Links

- [Getting Started](getting_started/installation.md)
- [Building a Workflow](defining_workflow/writing_workflow.md)
- [Building a Workflow](defining_workflow/writing_workflows.md)
- [Pipeline Modules Overview](pipeline_modules/overview.md)
- [S3 Integration](s3_integration/s3_pipeline.md)
- [Working with Remote Files](remote/s3.md)
Expand Down
5 changes: 2 additions & 3 deletions docs/s3_integration/configuring_s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,6 @@ logging.getLogger('botocore').setLevel(logging.DEBUG)
- Use bucket policies
- Enable access logging

For more information, see:
For more examples of using S3 in your pipelines, see the [S3 Pipeline Examples](s3_pipeline.md#examples) section.
- [AWS S3 Documentation](https://docs.aws.amazon.com/s3/)
- [Boto3 Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
- [CGAT Pipeline Examples](examples.md)
- [Boto3 Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)

0 comments on commit 9869f45

Please sign in to comment.