Poziv na predavanje: Learning to...

Istraživačka jedinica za znanost o podatcima Centra izvrsnosti za znanost o podatcima i napredne kooperativne sustave, u okviru projekta "DATACROSS – Napredne metode i tehnologije u znanosti o podatcima i kooperativnim sustavima", organizira predavanje istraživačkog seminara pod naslovom:

"Learning to Untangle Genome Assembly with Graph Convolutional Networks"

koje će održati prof. dr. sc. Mile Šikić s Fakulteta elektrotehnike i računarstva. Predavanje će se održati u četvrtak, 30. 06. 2022. godine, u 11.00 sati, u Bijeloj vijećnici Fakulteta elektrotehnike i računarstva.

Više o predavanju možete pročitati u opširnijem sadržaju obavijesti.

Sažetak predavanja (na engleskom jeziku): 

A quest to determine the complete sequence of a human DNA from telomere to telomere started three decades ago and was finally completed in 2021. This accomplishment was a result of a tremendous effort of numerous experts who engineered various tools and performed laborious manual inspection to achieve the first gapless genome sequence. However, such method can hardly be used as a general approach to assemble different genomes, especially when the assembly speed is critical given the large amount of data. In this work, we explore a different approach to the central part of the genome assembly task that consists of untangling a large assembly graph from which a genomic sequence needs to be reconstructed. Our main motivation is to reduce human-engineered heuristics and use deep learning to develop more generalizable reconstruction techniques. Precisely, we introduce a new learning framework to train a graph convolutional network to resolve assembly graphs by finding a correct path through them. The training is supervised with a dataset generated from the resolved CHM13 human sequence and tested on assembly graphs built using real human PacBio HiFi reads. Experimental results show that a model, trained on simulated graphs generated solely from a single chromosome, is able to remarkably resolve all other chromosomes. Moreover, the model outperforms hand-crafted heuristics from a state-of-the-art de novo assembler on the same graphs. Reconstructed chromosomes with graph networks are more accurate on nucleotide level, report lower number of contigs, higher genome reconstructed fraction and NG50/NGA50 assessment metrics.

Autor: Krešimir Pripužić
Learning to Untangle Genome Assembly with Graph Convolutional Networks
30. lipnja 2022. 11:00  -  12:00
Popis obavijesti