Veri Madenciliği Teknikleri Kullanılarak Gen Regülasyonunun İncelenmesi
Şahingil , Mehmet Cihan
xmlui.mirage2.itemSummaryView.MetaDataShow full item record
In this thesis, the studies was carried out on the examination of promoter regions belonging to different genes. Genetic components of Homo Sapiens, Drosophila Melanogaster and Saccharomyces Cerevisiae species were used in these studies. The main aim for this thesis is to investigate the relationship between the nucleotide regions in the gene promoter regions and gene functionality and gene classes by using various data mining techniques. If this main aim is considered, the investigations can be divided into three main topics. These topics can be listed as the investigation of complexity of the gene promoter regions, the classification of the genes according to the nucleotide regions which are located in the gene promoters, and the examining the relationship between the nucleotide regions in the gene promoters and the amino acid sequence of the proteins which are produced by the corresponding genes. Within the scope of this thesis, the data commonly used for all of the investigations is the nucleotide regions in the gene promoters. Therefore, as the first step of the investigations, the complexity of the promoter regions of the genes is investigated. In this thesis, entropy analysis and the investigation methods which give the frequency space transformations of the nucleotide sequences are used. According to the used methods it was observed that the complexity level of the promoter regions is high. If the complexity levels of Homo Sapiens, Drosophila Melanogaster and Saccharomyces Cerevisiae according to the used methodologies are compared, it is observed that Homo Sapiens has the least complexity level among the considered species. On the other hand it is also observed that the determined complexity level for the other species is close to the complexity level for Homo Sapiens. The second main subject of the thesis is on determining the gene classes by examining the nucleotide sequences in the promoter regions belonging to the genes of Homo Sapiens, Drosophila Melanogaster and Saccharomyces Cerevisiae. As a result of the classification studies for Homo Sapiens, it was observed that genes having protein coding ability and without protein coding ability could be classified with high performance by examining the nucleotide sequences within the promoter regions only. The gene promoter regions used in this study are those which lie between the nucleotides 50 nucleotides before the gene start nucleotide and 50 nucleotides after the gene start nucleotide. However, when the Drosophila Melanogaster and Saccharomyces Cerevisiae organisms are examined using the same methods, it was found that these organisms are not as successful as Homo Sapiens. The third and final subject of the thesis is to examine the relationship between the nucleotide sequence in the promoter regions of the genes of the corresponding species and the proteins encoded by the genes of interest. In this study, independent results of Homo Sapiens, Drosophila Melanogaster and Saccharomyces Cerevisiae species were examined and the results obtained for these species were compared. As a result of these investigation activities, the performance of the one-to-one mapping study between the nucleotide sequence in the promoter regions and the protein-producing amino acid sequence produced by the genes of interest was found to be low for all three species. However, it has been found that the study of presenting a protein set that is intended to contain the protein sequence corresponding to a particular promoter region rather than a one-to-one mapping study has a satisfactory level of performance. In this sense, the most powerful relationship between the nucleotide sequence of the promoter region and the amino acid sequence generated by the corresponding genes is found for Homo Sapiens. Saccharomyces Cerevisiae takes the second and Drosophila Melanogaster takes the third place in this ranking.