Towards an automated analysis of the quality of source code comments

Haddad, Mireille J.

dc.contributor.author	Haddad, Mireille J.
dc.date.accessioned	2022-02-07T09:55:24Z
dc.date.available	2022-02-07T09:55:24Z
dc.date.issued	2017
dc.identifier.citation	Haddad, M. J.(2017). Towards an automated analysis of the quality of source code comments (Master's thesis, Notre Dame University-Louaize, Zouk Mosbeh, Lebanon). Retrieved from http://ir.ndu.edu.lb/123456789/1460
dc.identifier.uri	http://ir.ndu.edu.lb/123456789/1460
dc.description	M.S. -- Faculty of Natural and Applied Sciences, Department of Computer Science, Notre Dame University, Louaize, 2017; "A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science"; Includes bibliographical references (leave 61-64).
dc.description.abstract	Maintenance is the most costly phase of the software life cycle. The maintenance cost of a program is estimated to be over 80% of its total life cycle costs (Erlikh, 2000). Since most of the maintenance time is devoted to understanding the program itself, program comprehension becomes essential. Often, a large fraction of the maintenance time is spent on reading code to understand what functionality of the program it implements. An insufficiently documented source code can be challenging for developers to understand and maintain. A clear and concise documentation can help developers to inspect and understand their ograms. Unfortunately, one of the major problems faced by developers during maintenance is that documentation is often not available or not useful. This thesis provides a heuristic approach for an automatic analysis and assessment of source-code comments by parsing by using a parser generator tool called ANTLR. This approach measures the antic similarity between the comment content and its corresponding entity identifier name. An algorithm was developed for splitting identifiers into component terms and computes the similarity percentage between the useful content of the comment and the identifier. The developed approach categorizes comments as follows: Scary noise, noise, normal with minor similar ity, probably meaningful, empty, and TODO. A study was carried out to evaluate the ability of the proposed approach to adequately assess source-code comments. In this study the source code of the Eclipse open source Integrated Development Environment (IDE) was parsed. The results showed that more than 50% of the comments fall into the category of empty comments and spread over 62% of the whole project files. Only 18% of the comments were of a high quality and around 20% of the files contain noise comments. Most Class and Interface identifiers have comments while more than 50% of the methods lack comments.	en_US
dc.format.extent	xii, 80 leaves : illustrations
dc.language.iso	en	en_US
dc.publisher	Notre Dame University-Louaize	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject.lcsh	Software maintenance
dc.subject.lcsh	Source code (Computer science)
dc.title	Towards an automated analysis of the quality of source code comments	en_US
dc.type	Thesis	en_US
dc.rights.license	This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 United States License. (CC BY-NC 3.0 US)
dc.contributor.supervisor	Akiki, Pierre A., Ph.D.	en_US
dc.contributor.department	Notre Dame University-Louaize. Department of Computer Sciences	en_US