Intelectual Property (IP)

New Approaches to Indexing and Standardizing Biological Sequence Data

“Initiatives like MetaGraph and the adoption of Standard ST.26 represent a concerted effort to harness global biological sequence data through innovative computational approaches.”

In a recent publication titled “’Google for DNA’ indexes 10% of world’s known genetic sequences,” the journal, Science, addressed promising results of a new tool called MetaGraph. Developed by a Biomedical Informatics academic group from ETH Zurich, MetaGraph organizes and compresses publicly available sequence data into a searchable format.

The Growth of Biological Data

The biological sequencing data available in public repositories is increasing exponentially, forming an invaluable resource for research. However, traditional bioinformatics tools struggle to scan such vast amounts of data efficiently. Beyond basic search functionalities, there is an increasing demand for advanced indexing systems that support complex queries, including sequence alignment, experiment assembly, and discovery. The accuracy and reliability of these sequences remain critical, as any errors can significantly impact downstream analyses.

A New Era of Indexing

MetaGraph has emerged to scale indexing and analyzing biological sequence libraries at a petabase scale. According to ETH Zurich’s group (Indexing All Life’s Known Biological Sequences | bioRxiv), the main results of this tool include:

i. high scalability and efficiency, outperforming other indexing tools in both space efficiency and query time;

ii. cost-effective and easily transportable, fitting petabases of data on a single consumer hard drive;

iii. support sequence search, alignment, and differential assembly; and

iv. large RNA-Seq cohorts, microbial samples, and protein sequences, making these available.

Patent Protection and Biological Sequences

An often-overlooked aspect of biological sequence data management is the patent protection system. Any biological material essential for the practical execution of a patent application must be characterized by a biological sequence included in the sequence listing file. In the past, sequence listing files could be presented in different formats and under different standards, depending on the filing jurisdiction. This scenario complicated the standardization and integration of data.

WIPO Standards and Their Evolution

To address these issues, the World Intellectual Property Organization (WIPO) first created Standard ST.25 for presenting biological sequences in patent applications. However, WIPO’s Standard ST.25 used a text document (.txt) format, had several deficiencies, inconsistent formatting, inadequate data processing practices, and data loss when imported into public databases. These shortcomings led to the development of WIPO Standard ST.26.

Sequence listing files transitioned to the WIPO Standard ST.26 using extensible markup language (XML) format in 2022, a change adopted worldwide. This transition marked a significant step towards harmonizing international practices in biological sequence data management for the patent protection system.

The WIPO Standard ST.26 was established to create a universal format facilitating data sharing and search across different jurisdictions. It enables patent applicants to create a single sequence listing acceptable for both international and national or regional procedures and allows sequence data to be exchanged electronically and integrated into computerized databases. The data presented in the sequence listing are divided into two parts:

  • General information: Bibliographic details that associate the sequence listing with the patent application; comprising the earliest priority, application number, and main applicant information, for example.
  • Biological sequence information: Sequence elements that comprise mandatory qualifiers to describe the molecule (e.g., DNA, RNA, or amino acid) and feature location, for example.

WIPO Sequence Software

To assist in the implementation of Standard ST.26, WIPO developed WIPO Sequence software. This intuitive tool allows applicants to create, edit, and verify sequence listing files. Although the transition to ST.26 posed initial challenges, it has ultimately enhanced the accessibility, classification, and preservation of biological information in sequence listings globally.

Takeaway

Overall, initiatives like MetaGraph and the adoption of Standard ST.26 represent a concerted effort to harness global biological sequence data through innovative computational approaches. These developments not only facilitate advanced research and patent prosecution but also ensure the preservation and accessibility of invaluable genetic information.

Image Source: Deposit Photos
Author: kentoh
Image ID: 151502036 

 

Millena Lourenco image

Millena Lourenco
Millena Pais Lourenço is a Patent Specialist for Chemical and Life Sciences with Daniel Law. She received a Bachelor’s degree in Biology from Universidade Veiga de Almeida (UVA) in 2019 […see more]

Millena Lourenco image

Karoline Coelho
is a Patent Specialist at Daniel Law. She has experience in prosecution of Brazilian patent applications, as well as the preparation of patentability, validity, and infringement reports focused on chemical […see more]

Story originally seen here

Editorial Staff

The American Legal Journal Provides The Latest Legal News From Across The Country To Our Readership Of Attorneys And Other Legal Professionals. Our Mission Is To Keep Our Legal Professionals Up-To-Date, And Well Informed, So They Can Operate At Their Highest Levels.

The American Legal Journal Favicon

Leave a Reply