Show simple item record

dc.contributor.advisorChristodoulou, Chrisen
dc.contributor.authorIoannou, Kyprosen
dc.coverage.spatialCyprusen
dc.creatorIoannou, Kyprosen
dc.date.accessioned2022-03-28T11:31:58Z
dc.date.available2022-03-28T11:31:58Z
dc.date.issued2021-12
dc.identifier.urihttp://gnosis.library.ucy.ac.cy/handle/7/65119en
dc.description.abstractIn this dissertation we attempt to solve the Email Classification problem with a novel method using a second-order function with a Convolutional Neural Network (CNN). As far as the literature is concerned, currently there is no other method that uses Hessian Free Optimisation with CNN to solve the Email Classification problem. We use CNN with Hessian Free Optimisation to distinguish between spam emails and ham (legitimate) emails. Word Embedding is applied to the data to convert them to a numerical form that the Neural Network model can understand. The Word Embedding we use is the Wor2Vec, and we achieve very satisfactory results. Furthermore, we use cross-validation to verify the model’s good accuracy. We split the data five-fold, and used in total six different datasets. We compare the model with other authors’ works and a classic Convolutional Neural network with Gradient Descent (GD) which we also implement in this dissertation. We measure the efficacy of each model by calculating the Accuracy, and Spam/Ham Recall. The accuracy measurement was used just for the CNN with GD since the aforementioned authors only provided Ham and Spam Recall measurements. We use the entire dataset for training when we compare this model with other authors’ work. We achieve accuracy of 99.199%, and 97.39%, 99.94% for Spam and Ham Recall for the first dataset respectively. For the second dataset, we achieve accuracy of 99.227% and 96.98% and 100%, Spam and Ham Recall. The accuracy was 99.848% in the third dataset, and the Spam, Ham Recalls were 99.59% and 99.94%. The accuracy of the fourth dataset was 99.333%, with 99.58% for Spam and 98.59% for Ham Recall. For the fifth dataset, accuracy was 99.061%, with 98.69% for Spam and a perfect score (100%) for Ham Recall. Finally in the sixth dataset, the accuracy was 98.997%, with 98.93% Spam and 99.19% Ham Recall. Lastly, we performed cross-validation and the average validation accuracy for each dataset was: Dataset 1 99.078%, Dataset 2 99.158%, Dataset 3 99.772%, Dataset 4 99.240%, Dataset 5 98.762% and Dataset 6 98.846%. The average Spam and Ham Recall for each dataset was similar to the ones mentioned in the previous paragraph, but we achieved two perfect scores in Ham Recall in the second and the fifth dataset. All other Spam and Ham Recalls from our implementation were between 96.72% and 99.88%. We also applied cross-validation for CNN with Gradient Descent, but the highest accuracy achieved was 76% in Dataset 4, and the lowest was in Dataset 5. The first three datasets have 0% Spam Recall, and the last three datasets have 0% Ham Recall.CNNs with Hessian Free Optimization do not just have better accuracy and ham/spam recall in every dataset, but also the model converges faster than the Gradient Descent model.We measure that with our best model; the HFO model converges 2.5 times faster than the Gradient Descent using the same dataset.en
dc.language.isoengen
dc.publisherΠανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών / University of Cyprus, Faculty of Pure and Applied Sciences
dc.rightsCC0 1.0 Universal*
dc.rightsinfo:eu-repo/semantics/openAccessen
dc.rightsOpen Accessen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/*
dc.titleSpam email classification using convolutional neural networks with Hessian-Free optimisationen
dc.typeinfo:eu-repo/semantics/masterThesisen
dc.contributor.committeememberPattichis, Costasen
dc.contributor.committeememberChrysanthou, Yiorgosen
dc.contributor.departmentΤμήμα Πληροφορικής / Department of Computer Science
dc.subject.uncontrolledtermCNNen
dc.subject.uncontrolledtermHFOen
dc.subject.uncontrolledtermSPAM EMAILSen
dc.subject.uncontrolledtermHESSIAN-FREEen
dc.author.facultyΣχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.departmentΤμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtypeMaster Thesisen
dc.contributor.orcidChristodoulou, Chris [0000-0001-9398-5256]


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

CC0 1.0 Universal
Except where otherwise noted, this item's license is described as CC0 1.0 Universal