Trade Standard for Intellectual Property of the People's Republic of China

Standard for The Presentation of Nucleotide and/or Amino Acid

Sequence Listings and Electronic File of Sequence Listings

Promulgated by the State Intellectual Property Office of the People's Republic of China on November 1, 2001
Enforced as of November 1, 2001

Standard for The Presentation of Nucleotide and/or Amino Acid Sequence Listings and Electronic File of Sequence Listings

1. General Provisions

In accordance with provision of Rule 18, Paragraph 4 of the Implementing Regulations of the Chinese Patent Law, where a patent application for invention covers one or more sequences of nucleotides or of amino acids, the description thereof shall contain a sequence listing complying with the prescription of the Patent Office of the State Intellectual Property Office (SIPO). The applicant shall submit a computer-readable copy of the table of sequence in the form prescribed by the Patent Office of the State Intellectual Property Office.

This Standard has been elaborated to provide standardization of nucleotide and amino acid sequence listings presented both on paper file and electronic file containing the sequence listings in computer-readable form for the benefit of applicants' filing. The Standard is intended to facilitate input of sequence data in electronic form into the computer database of the Patent Office of the SIPO and to allow the exchange of sequence data with other sequence searching database for the benefit of the public searching. Furthermore, it is intended for the benefit of examiners of the Patent Office to expedite the examination and provide better services for applicants.

2. Scope of Applicability

This Standard is applicable to any patent application filed at the Patent Office of the SIPO containing nucleotide or amino acid sequences. Precisely speaking, it is applicable to nucleotide and amino acid sequence listings of the application presented both on paper file and electronic file containing the sequence listings in computer-readable form.

3. Terms and Definitions

The terms and definitions listed below are introduced into this Standard.

1) The expression "sequence listing" means a part of the description of the application presented on paper file, which gives a detailed disclosure of the nucleotide and/or amino acid sequences and other available information. Sequences in the sequence listing are any unbranched nucleotide sequences of ten or more nucleotides or unbranched amino acid sequences of four or more amino acids. Branched sequences, sequences with fewer than four specifically defined nucleotides or amino acids as well as sequences comprising nucleotides or amino acids other than those listed in Appendix 1, Tables 1, 2, 3 and 4, are specifically excluded from the definition of sequences.

2) "Electronic file of sequence listing" is a plain text file containing nucleotide and/or amino acid sequence listings in computer-readable form.

3) "Nucleotides" embrace only those nucleotides that can be represented using the symbols set forth in Appendix 1, Table 1. Modifications, for example, methylated bases, may be described as set forth in Appendix 1, Table 2, but shall not be shown explicitly in the nucleotide sequence. As for modifications shown in the nucleotide sequence, please refer to Section 4. 4. 7. (1) and Section 4. 4. 5. of this Standard.

4) "Amino acids" embrace only those L-amino acids found in naturally occurring proteins and are listed in Appendix 1, Table 3, which do not include D-amino acids. Modifications, for example, hydroxylations or glycosylations, may be described as set forth in Appendix 1, Table 4, but shall not be shown explicitly in the amino acid sequence. As for modifications shown in the amino acid sequence, please refer to Section 4. 4. 7. (2) and Section 4. 4. 5. of the Standard.

5) "Sequence identifier" is a unique positive integer that corresponds to the SEQ ID NO assigned to each sequence in the listing.

6) "Numeric identifier" is a three-digit number bracketed between "< >", which represents a specific data element.

4. Numeric identifiers, their information and formats in sequence listings and electronic file of sequence listings.

Numeric identifiers as defined in the Standard shall be used for the presentation of the items of information in nucleotide and amino acid sequence listings and the electronic file of sequence listings. The corresponding information shall follow immediately after the numeric identifier (i. e. the information shall be given on the right of the numeric identifier and several lines below it if necessary), which should accord with the formats prescribed in the Standard. Numeric identifiers, their information and formats are illustrated with a specimen sequence listing shown in Appendix 2.

Numeric identifiers, their corresponding information and formats in sequence listings and electronic file of sequence listings are explained in detail as follows:

4.1. Items of information in sequence listings and electronic file of sequence listings:

The following information mentioned in Sections 4.1.1-4.1.7 shall be identical to the corresponding information indicated in the Request of the patent application.

4.1.1. Applicant name: numeric identifier <110>.

The names of all applicants of the patent application shall follow after numeric identifier <110>.

The foreign applicant shall indicate his/its English name after the Chinese translation of the name and bracket it with parentheses.

4.1.2. Title of invention: numeric identifier <120>.

The title of the patent application for invention shall follow after numeric identifier <120>.

4.1.3. File reference: numeric identifier <130>

The file reference of the patent application shall follow after numeric identifier <130>, which excludes the patent application without the file reference.

4.1.4. Patent application number: numeric identifier <140>

The above element data is not required in the sequence listing filed simultaneously with the filing of the patent application. The patent application number shall follow after numeric identifier <140> when a sequence listing or relevant amendments are filed at any time following the assignment of the application number.

4.1.5. Filing date: numeric identifier <141>

The above element data is not required in the sequence listing filed simultaneously with the filing of the patent application. The filing date of the patent application shall follow after numeric identifier <141> when a sequence listing or relevant amendments are filed at any time following the assignment of the filing date, the format of which is YYYY-MM-DD, eg. 2002-01-18.

4.1.6. Priority number: numeric identifier <150>

The priority number shall follow after numeric identifier <150> if a patent application claims the priority, which excludes the patent application not claiming the priority. The format of the element data is shown as follows: country code, district code or governmental organization code indicated in accordance with WIPO Standard ST .3, priority number, eg. CN93112388.7

4.1.7. Priority date: numeric identifier <151>

The priority date shall follow after numeric identifier <151> if a patent application claims the priority, which excludes the patent application not claiming the priority. The format of the element data is YYYY-MM-DD, eg. 2001-09-20.

4.2. Version information of the software used in electronic file of sequence listings: numeric identifier <170>

The title and version number of the software shall follow after numeric identifier <170> if the electronic file of nucleotide and/or amino acid sequence listing is processed through the software provided by the Patent Office of the State Intellectual Property Office of China or other patent organizations (eg. European Patent Office). The above element data is not included in the sequence listings that is not processed by the software mentioned above.

4.3. Number of sequences in sequence listings: numeric identifier <160>

The total number of sequences in sequence listings shall follow after numeric identifier <160>, which is a positive integer corresponding to the largest SEQ ID NO.

4.4. Items of information in sequences:

4.4.1. Sequence identifier: numeric identifier <210>

In the sequence listing, each sequence shall be assigned a separate and unique sequence identifier. The sequence identifiers shall begin with 1 and increase sequentially by integers. The sequence identifier represents the SEQ ID NO assigned to each sequence in the sequence listing.

The sequence identifier corresponding to each sequence shall follow after numeric identifier <210>.

The detailed information of a sequence shall be listed between the corresponding sequence identifier and the next sequence identifier, which is mentioned in following Sections 4.4.2-4.4.7.

Where there are several sequences in the sequence listing, the information of each sequence shall be listed successively in the numeric order of sequence identifiers from small to large.

4.4.2. Sequence length: numeric identifier <211>.

The sequence length expressed in number of bases or amino acids shall follow after numeric identifier <211>.

4.4.3. Sequence type: numeric identifier <212>

The molecular type of the sequence shall follow after numeric identifier <212>, either DNA, RNA or PRT. If a nucleotide sequence contains both DNA and RNA fragments, the value shall be "DNA"; in addition, the combined DNA/RNA molecule shall be further described in the <220> to <223> feature section.

4.4.4. Organism: numeric identifier <213>.

After numeric identifier <213> shall be indicated the biological source of the sequence, i.e. scientific name (genus species), in characters of both Chinese and Latin alphabet or "Artificial Sequence" or "Unknown".

Please note that the name in characters of Latin alphabet shall be written after the name in Chinese characters and bracketed between parentheses, eg. ��ǈ�� (Paramecium sp.).

4.4.5. Items of information of feature sections in the sequence: numeric identifiers <220>-<223>

This Section is regarding the description of items of information concerning feature sections of the sequence.

If "n" or a modified base is used in the nucleotide sequence (numeric identifier<400>) (see Section 4.4.7(1) of the Standard) or if "Xaa" or a modified/unusual L-amino acid is used in the amino acid sequence (numeric identifier<400>) (see Section 4.4.7(2) of the Standard), the following data elements (1)-(4) are mandatory in the sequence listing.

If the organism (numeric identifier <213>) is "Artificial Sequence" or "Unknown", the following data elements (1)-(4) are mandatory in the sequence listing.

Where there are several features in one sequence, the features shall be described successively in order of appearance of the features in the sequence.

Items of information of feature sections in the sequence and numeric identifiers are listed as follows in detail:

(1) Feature: numeric identifier <220>

Leave blank after numeric identifier <220>.

(2) Name/key: numeric identifier <221>

Name of the feature or key word shall follow after numeric identifier <221>. Only those keys as described in Table 5 or 6 of Appendix 1 shall be used to present the feature.

(3) Location: numeric identifier <222>

The location of the feature shall follow after numeric identifier <222>. The format to indicate the location is shown below:

From the number of the first base/amino acid in the feature to the number of the last base/amino acid in the feature, numbers are bracketed between parentheses and separated by suspension points "��", e.g. (279)��(389);

If several "n"'s or "Xaa"'s are used in the sequence, all their locations shall be indicated, e.g. (80, 100, 112). See Specimen Sequence Listing of Appendix 2.

(4) Other information: numeric identifier <223>.

Any other information relevant to the feature in the sequence shall follow after numeric identifier <223>. Where any modified base or modified amino acid is described in the sequence, the symbol associated with that base or amino acid from Tables 2 and 4 of Appendix 1 should be used.

4.4.6. Information of publication: numeric identifiers <300>-<312>.

Information of publication is not mandatory. Data elements concerning the information of publication are optional ones in sequence listings and electronic file of sequence listings.

(1) Publication information: numeric identifier <300>.

Leave blank after numeric identifier <300>.

(2) Authors; numeric identifier <301>.

Authors of the published document shall follow after numeric identifier <301>.

(3) Title: numeric identifier <302>.

Title of publication shall follow after numeric identifier <302>.

(4) Journal: numeric identifier <303>.

Journal name in which the data is published shall follow after numeric identifier <303>.

(5) Volume: numeric identifier <304>.

Journal volume in which the data is published shall follow after numeric identifier <304>.

(6) Issue: numeric identifier <305>.

Journal issue number in which the data is published shall follow after numeric identifier <305>.

(7) Pages: numeric identifier <306>.

The first journal page number and the last journal page number of the published document shall follow after numeric identifier <306>.

(8) Date: numeric identifier <307>.

Journal date on which the data was published shall follow after numeric identifier <307>, which shall be indicated in the format as YYYY-MM-DD, e.g. 1999-09-20.

(9) Database accession number: numeric identifier <308>.

If the published document has been accessed in a database, the database accession number of the document shall follow after numeric identifier <308>.

(10) Database entry date: numeric identifier <309>.

If the published document has been accessed in a database, the date of entry in database shall be indicated after numeric identifier <309> in the format as YYYY-MM-DD, e.g. 1999-09-20.

(11) Publication number: numeric identifier <310>.

If the publication document is a patent document, the publication number of the patent shall follow after numeric identifier <310> in the following order: country code, district code or governmental organization code indicated in accordance with WIPO Standard ST .3, the publication number indicated in accordance with WIPO Standard ST. 6, and the kind-of-document code indicated in accordance with WIPO Standard ST. 16, e,g, CN1183117A.

(12) Filing date: numeric identifier <311>.

If the publication document is a patent document, the filing date of the patent shall be indicated after numeric identifier <311> in the format as YYYY-MM-DD, e.g. 1999-09-20.

(13) Publication date: numeric identifier <312>.

If the publication document is a patent document, the publication date of the patent shall be indicated after numeric identifier <312> in the format as YYYY-MM-DD, e.g. 1999-09-20.

4.4.7. Nucleotide sequences and/or amino acid sequences: numeric identifier <400>

The sequence identifier of the corresponding sequence shall follow after numeric identifier <400>, beginning on the next line following the nucleotide and/or amino acid sequence.

The sequence may be a pure nucleotide sequence, a pure amino acid sequence or a nucleotide sequence together with its corresponding amino acid sequence.

(1) Pure nucleotide sequences:

A nucleotide sequence shall be presented only by a single strand, in the 5'-end to 3'-end direction from left to right. The terms 3' and 5' shall not be represented in the sequence.

The bases of a nucleotide sequence shall be represented using the one-letter code for nucleotide sequence characters. Only lower case letters in conformity with the list given in Appendix 2, Table 1, shall be used.

In a nucleotide sequence, modified bases shall be represented as the corresponding unmodified bases or as "n" in the sequence itself if the modified base is one of those listed in Appendix 1, Table 2. The symbol "n" is the equivalent of only one unknown or modified nucleotide. While in the feature section of the sequence (numeric identifiers <220>-<223>), the modification shall be further described, using the codes given in Appendix 1, Table 2 (see Section 4.4.5 of this Standard). The codes listed in Appendix 1, Table 2 may be used in the description or the feature section of the sequence listing but not in the sequence itself.

The enumeration of bases in a nucleotide sequence shall start at the first base of the sequence with number 1 and shall be continuous through the whole sequence in the direction 5' to 3'. The enumeration method for nucleotide sequences set forth above remains applicable to nucleotide sequences that are circular in configuration, with the exception that the designation of the first nucleotide of the sequence may be made at the option of the applicant.

A nucleotide sequence that is made up of one or more non-contiguous segments of a larger sequence or of segments from different sequences shall be numbered as a separate sequence, with a separate sequence identifier. A sequence with a gap or gaps shall be numbered as a plurality of separate sequences with separate sequence identifiers, with the number of separate sequences being equal in number to the number of continuous strings of sequence data.

A nucleotide sequence shall be listed with a maximum of 60 bases per line, with a space between each group of 10 bases. The number of the last base of the line shall be marked at the end of the line.

(2) Pure amino acid sequences:

The amino acids in a protein or peptide sequence shall be listed in the amino to carboxy direction from left to right. The amino and carboxy groups shall not be represented in the sequence.

The amino acids shall be represented using the three-letter code with the first letter as a capital and shall conform to the list given in Appendix 1, Table 3. An amino acid sequence that contains a blank or internal terminator symbols (for example, "Ter" or "*" or ".") may not be represented as a single amino acid sequence, but shall be presented as separate amino acid sequences.

In an amino acid sequence, modified and unusual amino acids shall be represented as the corresponding unmodified amino acids or as "Xaa" in the sequence itself if the modified amino acid is one of those listed in Appendix 1, Table 4. The symbol "Xaa" is the equivalent of only one unknown or modified amino acid. While in the feature section of the sequence (numeric identifiers <220>-<223>), the modification shall be further described, using the codes given in Appendix 1, Table 4 (see Section 4.4.5 of this Standard). The codes listed in Appendix 1, Table 4 may be used in the description or the feature section of the sequence listing but not in the sequence itself.

The enumeration of amino acids shall start at the first amino acid of the sequence with number 1 marked under the amino acid. Then the number of the amino acid shall be marked under the sequence every five amino acids. Optionally, the amino acids preceding the mature protein, for example pre-sequences, pro-sequences, pre-pro-sequences and signal sequences, when present, may have negative numbers, counting backwards starting with the amino acid next to number 1. Zero (0) is not used when the numbering of amino acids uses negative numbers to distinguish the mature protein. The enumeration method for amino acid sequences set forth above remains applicable for amino acid sequences that are circular in configuration, with the exception that the designation of the first amino acid of the sequence may be made at the option of the applicant.

An amino acid sequence that is made up of one or more non-contiguous segments of a larger sequence or of segments from different sequences shall be numbered as a separate sequence, with a separate sequence identifier. A sequence with a gap or gaps shall be numbered as a plurality of separate sequences with separate sequence identifiers, with the number of separate sequences being equal in number to the number of continuous strings of sequence data.

An amino acid sequence shall be listed with a maximum of 16 amino acids per line, with a space provided between each amino acid.

(3) Nucleotide sequences together with their corresponding amino acid sequences:

Bases corresponding to the codons in the coding parts of a nucleotide sequence shall be listed as triplets (codons), with a space between each codon. Amino acids corresponding to the codons in the coding parts of a nucleotide sequence shall be immediately under the corresponding codons. The first amino acid of the corresponding amino acid sequence shall be marked number 1 under the amino acid. Then the number of the amino acid shall be marked under the sequence every five amino acids.

For those sequences disclosed in the composite format as mentioned above, the amino acid sequence corresponding to the nucleotide sequence shall also be disclosed separately in the sequence listing as a pure amino acid sequence.

4.5. Format of numeric identifiers and the following information:

In this Section, "numeric identifiers and information" is referred to numeric identifiers and the corresponding information following them.

Numeric identifiers and information shall be listed in the sequence listing in the numeric order of numeric identifiers from small to large.

Generally, a blank line shall be inserted between numeric identifiers. However, when the first two digits in the numeric identifier are identical, such as from <210> to <213>, from <220> to <223>, it is unnecessary to insert a blank line between the numeric identifiers. An exception is that, where there are several features in one sequence, a blank line shall be inserted before each numeric identifier <220> in the description of each feature.

Where there are several sequences in one sequence listing, numeric identifiers and information shall be listed in the numeric order of sequence identifiers from small to large. In each sequence, only numeric identifiers and information relevant to the sequence shall be listed in the numeric order of numeric identifiers from small to large, i.e. numeric identifiers <210>-<400> and information shall be listed.

Where there are several features in one sequence, the numeric identifiers <220>-<223> and information shall be presented successively in order of appearance of the features in the sequence.

5. Format of electronic file of sequence listings:

5.1. The electronic file of sequence listings is a plain text file, which contains numeric identifiers and information mentioned above in Part 4 and accords with the format requirement mentioned in Part 4. The electronic file shall be encoded using Standard of Character Set of Code for Chinese Characters for Information Exchange issued by the People's Republic of China.

5.2. The electronic file of sequence listings submitted shall be recorded on a CD or a 3.5- inch diskett, or shall be submitted in other forms provided by the SIPO. The CD recording the electronic file shall be a CD written under Standard IS09660. The diskette recording the electronic file shall be in compliance with Format FAT 12. Directory structure in the CD or diskette is shown as follows: there shall be one and only one plain text file with ". SEQ" as suffix under root directory.

6. Other items of information

6.1. The sequence listings on electronic file submitted in computer readable form shall be identical to the written sequence listings on paper file.

6.2. When making the electronic file of sequence listings complying with this Standard, the applicant may use the editing software for sequence listings provided by the SIPO, software provided by other patent organizations (e.g. Patentin provided by the European Patent Office) or any software for editing plain text file. No matter what software is used, the created electronic file must accord with the provisions of this Standard.

6.3. When the applicant submits the CD or diskette recording the electronic file of sequence listings, the submitted CD or diskette shall have a label permanently affixed there on which indicates the name of the applicant, the title of the invention, the name of the electronic file and the filing date. Optionally, the corresponding reference number given by the patent agency may be marked on the label if patent agents are authorized to act on behalf of the applicant. When the applicant submits the sequence listing or make relevant amendments after the assignment of the application number, the application number shall be indicated on the label. Furthermore, words like "late submission" or "amendment" shall be marked on the label.

Corresponding numeric identifiers and information mentioned in this Standard shall be used to mark the applicant's name and other items of information, e.g. <110>Gene Development Ltd. The filing date shall be marked in the format as YYYY-MM-DD.

The electronic file of sequence listings shall be recorded on one CD of the bytes if the electronic file are too large to record on one diskette.

7. Promulgation and Enforcement

This standard is promulgated by the State Intellectual Property Office of the People's Republic of China and shall enter into force as of November 1, 2001.

State Intellectual Property Office of the People's Republic of China

November 1, 2001

Appendices (TIF format)