A Novel Approach to Compress and Secure Human Genome Sequence
Authors: Garima Mathur, Anjana Pandey and Sachin Goyal
Publishing Date: 15-09-2022
ISBN: 978-81-955020-5-9
Abstract
DNA sequences can be considered as a pool of genetic information mostly used for reproduction, classification, and detection of disease. FASTA is the commonly used DNA sequence in a textual format whose size is too large which makes it difficult to store and manage; also securing this data is a big issue. Compression techniques that can reduce the size of these DNA data files are the most appropriate solution, reducing the size also reduces the need for resources for transmission. Therefore, this work proposes a novel ASCII-based compression algorithm, in which DNA characters are first converted into ASCII integers and then delta computed, afterwards the LZW compression technique is applied to the computed result. For ensuring the security of data, the blockchain-based framework is used after the compression module to make data immutable. In this paper, for methods like LZW and Huffman code, compression ratio comparisons were also determined for homosapiens, and from the results, it is clear that the proposed algorithm shows a good compression ratio for some randomly selected data sets. Another aim of this paper is to show the benefits of using a blockchain-based framework in securing healthcare data.
Keywords
Genome sequence, Compression, FASTA, Blockchain, Delta computation, Genbank
Cite as
Garima Mathur, Anjana Pandey and Sachin Goyal, "A Novel Approach to Compress and Secure Human Genome Sequence", In: Saroj Hiranwal and Garima Mathur (eds), Artificial Intelligence and Communication Technologies, SCRS, India, 2022, pp. 305-317. https://doi.org/10.52458/978-81-955020-5-9-31