Spam email classification using convolutional neural networks with Hessian-Free optimisation

Ioannou, Kypros

dc.contributor.advisor	Christodoulou, Chris	en
dc.contributor.author	Ioannou, Kypros	en
dc.coverage.spatial	Cyprus	en
dc.creator	Ioannou, Kypros	en
dc.date.accessioned	2022-03-28T11:31:58Z
dc.date.available	2022-03-28T11:31:58Z
dc.date.issued	2021-12
dc.identifier.uri	http://gnosis.library.ucy.ac.cy/handle/7/65119	en
dc.description.abstract	In this dissertation we attempt to solve the Email Classification problem with a novel method using a second-order function with a Convolutional Neural Network (CNN). As far as the literature is concerned, currently there is no other method that uses Hessian Free Optimisation with CNN to solve the Email Classification problem. We use CNN with Hessian Free Optimisation to distinguish between spam emails and ham (legitimate) emails. Word Embedding is applied to the data to convert them to a numerical form that the Neural Network model can understand. The Word Embedding we use is the Wor2Vec, and we achieve very satisfactory results. Furthermore, we use cross-validation to verify the model’s good accuracy. We split the data five-fold, and used in total six different datasets. We compare the model with other authors’ works and a classic Convolutional Neural network with Gradient Descent (GD) which we also implement in this dissertation. We measure the efficacy of each model by calculating the Accuracy, and Spam/Ham Recall. The accuracy measurement was used just for the CNN with GD since the aforementioned authors only provided Ham and Spam Recall measurements. We use the entire dataset for training when we compare this model with other authors’ work. We achieve accuracy of 99.199%, and 97.39%, 99.94% for Spam and Ham Recall for the first dataset respectively. For the second dataset, we achieve accuracy of 99.227% and 96.98% and 100%, Spam and Ham Recall. The accuracy was 99.848% in the third dataset, and the Spam, Ham Recalls were 99.59% and 99.94%. The accuracy of the fourth dataset was 99.333%, with 99.58% for Spam and 98.59% for Ham Recall. For the fifth dataset, accuracy was 99.061%, with 98.69% for Spam and a perfect score (100%) for Ham Recall. Finally in the sixth dataset, the accuracy was 98.997%, with 98.93% Spam and 99.19% Ham Recall. Lastly, we performed cross-validation and the average validation accuracy for each dataset was: Dataset 1 99.078%, Dataset 2 99.158%, Dataset 3 99.772%, Dataset 4 99.240%, Dataset 5 98.762% and Dataset 6 98.846%. The average Spam and Ham Recall for each dataset was similar to the ones mentioned in the previous paragraph, but we achieved two perfect scores in Ham Recall in the second and the fifth dataset. All other Spam and Ham Recalls from our implementation were between 96.72% and 99.88%. We also applied cross-validation for CNN with Gradient Descent, but the highest accuracy achieved was 76% in Dataset 4, and the lowest was in Dataset 5. The first three datasets have 0% Spam Recall, and the last three datasets have 0% Ham Recall.CNNs with Hessian Free Optimization do not just have better accuracy and ham/spam recall in every dataset, but also the model converges faster than the Gradient Descent model.We measure that with our best model; the HFO model converges 2.5 times faster than the Gradient Descent using the same dataset.	en
dc.language.iso	eng	en
dc.publisher	Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών / University of Cyprus, Faculty of Pure and Applied Sciences
dc.rights	CC0 1.0 Universal	*
dc.rights	info:eu-repo/semantics/openAccess	en
dc.rights	Open Access	en
dc.rights.uri	http://creativecommons.org/publicdomain/zero/1.0/	*
dc.title	Spam email classification using convolutional neural networks with Hessian-Free optimisation	en
dc.type	info:eu-repo/semantics/masterThesis	en
dc.contributor.committeemember	Pattichis, Costas	en
dc.contributor.committeemember	Chrysanthou, Yiorgos	en
dc.contributor.department	Τμήμα Πληροφορικής / Department of Computer Science
dc.subject.uncontrolledterm	CNN	en
dc.subject.uncontrolledterm	HFO	en
dc.subject.uncontrolledterm	SPAM EMAILS	en
dc.subject.uncontrolledterm	HESSIAN-FREE	en
dc.author.faculty	Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.department	Τμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtype	Master Thesis	en
dc.contributor.orcid	Christodoulou, Chris [0000-0001-9398-5256]

Files in this item

Name:: Kypros_Ioannou.pdf
Size:: 3.645Mb
Format:: PDF
Description:: Master Thesis

View/Open

Name:: license_rdf
Size:: 1.063Kb
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Τμήμα Πληροφορικής / Department of Computer Science [106]

Show simple item record

Except where otherwise noted, this item's license is described as CC0 1.0 Universal