Skip to content

Commit

Permalink
squash most of the typos
Browse files Browse the repository at this point in the history
  • Loading branch information
DmitryLitvintsev committed Nov 8, 2023
1 parent 71ec5b9 commit 54dfe99
Show file tree
Hide file tree
Showing 7 changed files with 21 additions and 21 deletions.
6 changes: 3 additions & 3 deletions docs/cta_schema.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Here is CTA schema:
Unlike Enstore schema, the CTA db schema is more normalized and specifically
separates out the concepts of logical libraries, virtual organization, storage class into corresponding tables. Therefore, the names of ``virtual_organization``, ``logical_library`` and ``storage_class`` have to be defined by admin in advance before any file can be written.

Additonally CTA has a concept of ``tape_pool`` that represents logical grouping
Additionally CTA has a concept of ``tape_pool`` that represents logical grouping
of tapes. Each tape belongs to exactly one tape pool. Tape pools are used to keep data belonging to different VOs, storage_class (via ``archive_route``).

The ``archive_route`` table connects ``storage_class`` to ``tape_pool`` and specifies how many copies a file must have.
Expand All @@ -19,9 +19,9 @@ File table
CTA separates the concept of `file` into an abstract ``archive_file`` that may
have multiple corresponding ``tape_file`` entries. The ``archive_file`` table stores file size; adler32 checksum; dis instance name; ``disk_file_id`` - an inode number on storage front end; unique file id (``archive_file_id``); user UID/GID and a deleted flag.

A ``tape_file`` references ``archive_file`` and contains infromation that ties it to the tape - like volume id (``vid``); location on the tape and copy number.
A ``tape_file`` references ``archive_file`` and contains information that ties it to the tape - like volume id (``vid``); location on the tape and copy number.

Storage class table
-------------------

The storage class concept is somewhat similar to file family concept of Enstore. Besides unique name ``stotrage_class_name`` it specifies how many copes a file must have. And it has a reference to ``virtual_orhanization``.
The storage class concept is somewhat similar to file family concept of Enstore. Besides unique name ``storage_class_name`` it specifies how many copes a file must have. And it has a reference to ``virtual_organization``.
4 changes: 2 additions & 2 deletions docs/dcache_setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ dCache setup with CTA
Pool
----

Deploy dcache-cta driver on pool node::
Deploy dCache-CTA driver on pool node::

wget https://download.dcache.org/nexus/repository/dcache-cta/dcache-cta-0.8.0-1.noarch.rpm
rpm -Uvh --force dcache-cta-0.8.0-1.noarch.rpm
Expand All @@ -15,7 +15,7 @@ Define hsm on pool::

hsm create cta cta dcache-cta -cta-user=adm -cta-group=eosusers -cta-instance-name=eosdev -cta-frontend-addr=ctahost:17017 -io-port=1094

Each pool on the pool node has to havea dedicated port.
Each pool on the pool node has to have dedicated port.

Define queue on pool::

Expand Down
4 changes: 2 additions & 2 deletions docs/dcache_sfa.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ SFA files
=========

One of the issues that has been identified - CTA does not have
functinality corresponding to Enstore SFA (Small File Aggregation).
functionality corresponding to Enstore SFA (Small File Aggregation).
In the nutshell the SFA system is as extension of Enstore system that
manages intermediate disk storage on the side (intermediate between
dCache and Enstore). Depending on policies based on ``file_family``,
Expand All @@ -24,6 +24,6 @@ We need to translate::
-> dcache://dcache/?store=vo&group=file_family&bfid={child_pnfsid}:package_pnfsid

I.e. the child/package relation exists as location in ``t_locationinfo`` Chimera
table. As long as these locations exist dCache can read thise files from CTA using an hsm script. T.e. SAPPHIRE system is not need for reading of SFA files.
table. As long as these locations exist dCache can read these files from CTA using an hsm script. T.e. SAPPHIRE system is not need for reading of SFA files.

This can be populated out of band.
2 changes: 1 addition & 1 deletion docs/enstore2cta_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
Configuration
--------------

Script expects configuration file ``enstore2cta.yaml`` in the current directory or pointed to by environment variable ``MIGRATION_CONFIG``. The yaml file has to have "0600" permission bits and has to have the following parameters defned:
Script expects configuration file ``enstore2cta.yaml`` in the current directory or pointed to by environment variable ``MIGRATION_CONFIG``. The yaml file has to have "0600" permission bits and has to have the following parameters defined:

.. literalinclude:: ../etc/enstore2cta.yaml
10 changes: 5 additions & 5 deletions docs/enstore2cta_script.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@ enstore2cta - Enstore to CTA migration script
=============================================

The script ``enstore2cta.py``, located in ``enstore2cta/scripts``, implements
database migration from Enstore DB tro CTA DB. Both databases must be
database migration from Enstore DB to CTA DB. Both databases must be
`PostgreSQL` databases. The script has various steering options (see below).
It spawns multuiple processes, each process processing a unique Enstore volume.
It spawns multiple processes, each process processing a unique Enstore volume.


Requirements
------------


The scrit works both with python2 and python3 and requires ``psycopg2`` module be installed (using ``pip`` or ``yum install python-psycopg2``).
The script works both with python2 and python3 and requires ``psycopg2`` module be installed (using ``pip`` or ``yum install python-psycopg2``).


Invocation
Expand Down Expand Up @@ -49,7 +49,7 @@ Look for example in ``enstore2cta/etc``. It must have "0600" permission (to prot
single volume to existing system using --add option
(default: None)
--cpu_count CPU_COUNT
override cpu count - number of simulateously processed
override cpu count - number of simultaneously processed
labels (default: 8)
single volume to existing system using --add option

Expand All @@ -58,5 +58,5 @@ Look for example in ``enstore2cta/etc``. It must have "0600" permission (to prot

The script can work with individual label(s) passed as comma separated values to ``--label`` option. Or it can be invoked with ``--all`` switch to migrate all labels. The migration is done by label.

Additionally, on an existing CTA systenm one can use
Additionally, on an existing CTA system one can use
``--add`` option to add a volume also specifying its ``--storage_class`` (e.g. "cms.foo") and ``--vo`` (e.g. "cms").
14 changes: 7 additions & 7 deletions docs/enstore_schema.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@ File table
----------

Each file copy in Enstore is uniquely identified by a BFID - bit file id,
which is a string obtained by adding a three letter `brand` (which is the same for all files in a given Enstore instance), the Unix epoch, multiplied by 100000 and a counter which is reserved to resolve collisions. BFID is generated in the code base and is iserted into ``file`` table where it has unique contraint. If insert fails, the counter is incremented and the record insertion is tried again. And so on until it succeeds.
which is a string obtained by adding a three letter `brand` (which is the same for all files in a given Enstore instance), the Unix epoch, multiplied by 100000 and a counter which is reserved to resolve collisions. BFID is generated in the code base and is inserted into ``file`` table where it has unique constraint. If insert fails, the counter is incremented and the record insertion is tried again. And so on until it succeeds.

.. code-block:: python
bfid = "CDMS" + str(time.time()*100000)
Each file record contains PNFSID (dCache inode identifier) that ties it back to
the front end storage system; adler32 checksum; a reference to the file package for small files in SFA (Small File Aggregation) equal BFID of the package or ``null`` for `direct` files; file size; original file name; UID/GID of user who creted the file; tape location and a ``deleted`` flag that indicates whether or not the file
the front end storage system; adler32 checksum; a reference to the file package for small files in SFA (Small File Aggregation) equal BFID of the package or ``null`` for `direct` files; file size; original file name; UID/GID of user who created the file; tape location and a ``deleted`` flag that indicates whether or not the file
has been removed from namespace.

Volume table
Expand All @@ -31,16 +31,16 @@ Every tape in Enstore is stored in the ``volume`` table.
The many to one ``file`` to ``volume`` relation is done on integer ``volume.id`` primary key via ``file.volume`` foreign key.

Each volume record tracks how many active/deleted/total files and bytes exist
on the volume (via DB trigger on insert/update/delete). It has a volume label; total/remaining bytes; number of mounts; number of read and write accesses; severla status fields that allow to classify tapes (e.g. ``full``, ``NOACCESS``, ``NOTALLOWED``, ``migrated``, ``migrating``). The values of status fields are arbitrary strings.
on the volume (via DB trigger on insert/update/delete). It has a volume label; total/remaining bytes; number of mounts; number of read and write accesses; several status fields that allow to classify tapes (e.g. ``full``, ``NOACCESS``, ``NOTALLOWED``, ``migrated``, ``migrating``). The values of status fields are arbitrary strings.

The Enstore system has a concept of virtual library, so called libary manager (LM). The LM
The Enstore system has a concept of virtual library, so called library manager (LM). The LM
manages a set of movers that have SCSI tape drives attached. LM (and movers) are Enstore servers and are configured based in Enstore instance configuration and are not captured in database schema. Each LM has a unique name and draws specific tapes allocated for it. This relation is captured in ``volume.library`` field.

Since Enstroe LMs map to actual physical tape libraries, the volumes have to be pre-allocated to specific LMs.
Since Enstore LMs map to actual physical tape libraries, the volumes have to be pre-allocated to specific LMs.

Accounting and data steering aspects of Enstore operations use ``volume.storage_group`` field (usually corresponding to a VO name); ``volume.file_family`` a string field that tells enstore to use the same set of tapes to write data having this attribute. ``volume.file_family_width`` an integer taht specified how many tape deives can be used simultaneously to write data with speciric ``file_family``.
Accounting and data steering aspects of Enstore operations use ``volume.storage_group`` field (usually corresponding to a VO name); ``volume.file_family`` a string field that tells Enstore to use the same set of tapes to write data having this attribute. ``volume.file_family_width`` an integer that specified how many tape drives can be used simultaneously to write data with specific ``file_family``.

Enstore does not have pre-defined ``library``, ``storage_group`` and ``file_family`` concepts. When files are written to Enstore it receives the instruction fo what (``library``, ``file_family``, ``file_family_width``) to use from Enstore command line client ``encp``. When invoked, the ``encp`` client takes these parameters from directory tags of the destination directory or they can be passed as options to encp. File family value can be completelty arbitrary and user defined. Specifying random ``library`` string results in failure to write if Enstore does not actually have a running LM with matching name.
Enstore does not have pre-defined ``library``, ``storage_group`` and ``file_family`` concepts. When files are written to Enstore it receives the instruction of what (``library``, ``file_family``, ``file_family_width``) to use from Enstore command line client ``encp``. When invoked, the ``encp`` client takes these parameters from directory tags of the destination directory or they can be passed as options to encp. File family value can be completely arbitrary and user defined. Specifying random ``library`` string results in failure to write if Enstore does not actually have a running LM with matching name.

File copies
-----------
Expand Down
2 changes: 1 addition & 1 deletion enstore2cta/scripts/enstore2cta.py
Original file line number Diff line number Diff line change
Expand Up @@ -1197,7 +1197,7 @@ def main():
action = "store",
type = int,
default = multiprocessing.cpu_count(),
help="override cpu count - number of simulateously processed labels")
help="override cpu count - number of simultaneously processed labels")


args = parser.parse_args()
Expand Down

0 comments on commit 54dfe99

Please sign in to comment.