Spacer
Contact
Research
Teaching
Bio
Spacer
Spacer Spacer

I am broadly interested in the area of Software Engineering, Programming Languages, Systems, Security, and Data Mining. I have been spending substantial efforts on code mining, analysis, and comprehension, aiming to provide practical techniques and tools for enhancing software reliability, increasing development productivity, reducing maintenance cost, and improving user experience.

Context-Aware Software Mining and Analysis

A general theme of my work is mining and analysis for software engineering, such as detection of code clones, code query processing, detection of bugs, search for bug fixes, and search for better testing & debugging techniques.

The search is being carried out on various contextual data sources in addition to program code itself, such as code change histories, program bug databases, test suites, developer activities, user feedbacks, and socio-technical information pertaining to the complex interactions between people and technologies in both software development processes and real-world usage scenarios.

To enable the extraction of information from various data sources and to enable efficient search and analysis, various technologies are being employed, such as static & dynamic program analysis, software engineering methodologies, data mining, information retrieval, natural language processing, and distributed computing techniques.

Publications

Scalable Code Clone Detection

Our studies and others' have noticed that on average more than 20% of code in large programs is cloned code, which often leads to higher maintenance cost and subtle software defects. The goal of our research is to scalably and accurately detect various code clones, track their evolutions and migrations among large programs, and manage them properly to facilitate program understanding and reengineering. Many applications, such as code refactoring, bug detection, and plagiarism detection, can stem from code clone detection and analysis.

  • DECKARD: A Code Clone and Clone-Related Bug Detection Tool
  • Understanding the Genetic Makeup of Linux Device Drivers, by Peter Senna TSCHUDIN, Laurent REVEILLERE, Lingxiao JIANG, David LO, Julia LAWALL, and Gilles MULLER. In the proceedings of the 7th Workshop on Programming Languages and Operating Systems (PLOS '13), Farmington, Pennsylvania, USA, 2013. [on ACM DL, pdf]
  • Active Refinement of Clone Anomaly Reports, by Lucia, David LO, Lingxiao JIANG, and Aditya Budi. In the proceedings of the 34th International Conference on Software Engineering (ICSE '12), Zurich, Switzerland, 2012. [on IEEE Xplore and ACM DL, pdf]
  • Automatic Mining of Functionally Equivalent Code Fragments via Random Testing, by Lingxiao JIANG and Zhendong SU. In the proceedings of the 18th International Conference on Software Testing and Analysis (ISSTA '09), Chicago, Illinois, USA, 2009. [PDF from ACM DL ACM DL
                        Author-ize service, on ACM DL, pdf, slides.pdf]
  • Scalable Detection of Semantic Clones, by Mark GABEL, Lingxiao JIANG, and Zhendong SU. In the proceedings of the 30th International Conference on Software Engineering (ICSE '08), Leipzig, Germany, 2008. [PDF from ACM DL ACM DL Author-ize service, on ACM DL, pdf, slides.pdf]
  • Context-Based Detection of Clone-Related Bugs, by Lingxiao JIANG, Zhendong SU, and Edwin CHIU. In the proceedings of the 6th joint meeting of the 11th European Software Engineering Conference and the 15th ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE '07), Dubrovnik, Croatia, 2007. [PDF from ACM DL ACM DL Author-ize service, on ACM DL, pdf, slides.pdf]
  • DECKARD: Scalable and Accurate Tree-based Detection of Code Clones, by Lingxiao JIANG, Ghassan MISHERGHI, Zhendong SU, and Stephane GLONDU. In the proceedings of the 29th International Conference on Software Engineering (ICSE '07), Minneapolis, Minnesota, USA, 2007. [pdf, ps, slides.pdf, on IEEE Xplore and ACM DL]

Queries & Analysis of Software Data (Such as Code, Repositories, Bug Databases, Documents, User/Developer Interactions)

  • Got Issues? Who Cares About It? A Large Scale Investigation of Issue Trackers from GitHub, by Tegawende F. BISSYANDE, David LO, Lingxiao JIANG, Laurent REVEILLERE, Jacques KLEIN, and Yves Le TRAON. In the proceedings of the IEEE 24th International Symposium on Software Reliability Engineering (ISSRE '13), Pasadena, California, USA, 2013. [on IEEE Xplore, pdf]
  • Popularity, Interoperability, and Impact of Programming Languages in 100,000 Open Source Projects, by Tegawende F. BISSYANDE, Ferdian THUNG, David LO, Lingxiao JIANG, and Laurent REVEILLERE. In the proceedings of the 37th Annual International Computer Software & Applications Conference (COMPSAC '13), Kyoto, Japan, 2013. [on IEEE Xplore, pdf]
  • Orion: A Software Project Search Engine with Integrated Diverse Software Artifacts, by Tegawende F. BISSYANDE, Ferdian THUNG, David LO, Lingxiao JIANG, and Laurent REVEILLERE. In the proceedings of the 18th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS '13), Singapore, 2013. [on IEEE Xplore, pdf]
  • Understanding Widespread Changes: A Taxonomic Study, by Shaowei WANG, David LO, and Lingxiao JIANG. In the proceedings of the 17th European Conference on Software Maintenance and Reengineering (CSMR '13), Genova, Italy, 2013. [on IEEE Xplore, pdf]
  • Network Structure of Social Coding in GitHub, by Ferdian THUNG, Tegawende F. BISSYANDE, David LO, and Lingxiao JIANG. In the proceedings of the 17th European Conference on Software Maintenance and Reengineering (CSMR '13), Genova, Italy, 2013. [on IEEE Xplore, pdf]
  • An Empirical Study on Developer Interactions in StackOverflow, by Shaowei WANG, David LO, and Lingxiao JIANG. In the proceedings of the 28th ACM Symposium on Applied Computing (SAC '13), Coimbra, Portugal, 2013. [on ACM DL, pdf]
  • Diffusion of Software Features: An Exploratory Study, by Ferdian THUNG, David LO, and Lingxiao JIANG. In the proceedings of the 19th Asia-Pacific Software Engineering Conference (APSEC '12), Hong Kong, 2012. [on IEEE Xplore, pdf]
  • Detecting Similar Applications With Collaborative Tagging, by Ferdian THUNG, David LO, and Lingxiao JIANG. In the proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM '12), Riva del Garda, Trento, Italy, 2012. [pdf, on IEEE Xplore]
  • Inferring Semantically Related Software Terms and Their Taxonomy By Leveraging Collaborative Tagging, by Shaowei WANG, David LO, and Lingxiao JIANG. In the proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM '12), Riva del Garda, Trento, Italy, 2012. [pdf, on IEEE Xplore]
  • Automated Detection of Likely Design Flaws in Layered Architectures, by Aditya BUDI, Lucia, David LO, Lingxiao JIANG, and Shaowei WANG. In the proceedings of the 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE '11), Miami Beach, USA, 2011. [pdf, demo on YouTube]
  • Code Search via Topic-Enriched Dependence Graph Matching, by Shaowei WANG, David LO, and Lingxiao JIANG. In the proceedings of the 18th Working Conference on Reverse Engineering (WCRE '11 on facebook), Limerick, Ireland, 2011. [pdf, on IEEE Xplore]
  • Concern Localization Using Information Retrieval: An Empirical Study on Linux Kernel, by Shaowei WANG, David LO, Zhenchang XING, and Lingxiao JIANG. In the proceedings of the 18th Working Conference on Reverse Engineering (WCRE '11 on facebook), Limerick, Ireland, 2011. [pdf, on IEEE Xplore]

Automated Testing

  • An Empirical Study of Adoption of Software Testing in Open Source Projects, by Pavneet Singh KOCHHAR, Tegawende F. BISSYANDE, David LO, and Lingxiao JIANG. In the proceedings of the 13th International Conference on Quality Software (QSIC '13), Nanjing, China, 2013. [on IEEE Xplore, pdf]. A preliminary version of this paper appeared as Adoption of Software Testing in Open Source Projects---A Preliminary Study on 50,000 Projects, by Pavneet Singh KOCHHAR, Tegawende F. BISSYANDE, David LO, and Lingxiao JIANG. In the proceedings of the 17th European Conference on Software Maintenance and Reengineering (CSMR '13), Genova, Italy, 2013. [pdf]
  • kbe-Anonymity: Test Data Anonymization for Evolving Programs, by Lucia, David LO, Lingxiao JIANG, and Aditya BUDI. In the proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE '12), Essen, Germany, 2012. [on ACM DL, pdf]
  • kb-Anonymity: A Model for Anonymized Behavior-Preserving Test and Debugging Data, by Aditya BUDI, David LO, Lingxiao JIANG, and Lucia. In the proceedings of the 32nd ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI '11), San Jose, California, USA, 2011. [PDF from ACM DL ACM DL Author-ize service, on ACM DL, pdf, slides.pdf]
  • Profile-Guided Program Simplification for Effective Testing and Analysis, by Lingxiao JIANG and Zhendong SU. In the proceedings of the 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE '08), Atlanta, Georgia, 2008. [PDF from ACM DL ACM DL Author-ize service, on ACM DL, pdf, slides.pdf]

Automated Debugging

  • Automatic Recovery of Root Causes from Bug-Fixing Changes, by Ferdian THUNG, David LO, and Lingxiao JIANG. In the proceedings of the 20th Working Conference on Reverse Engineering (WCRE '13), Koblenz, Germany, 2013. [on IEEE Xplore, pdf]
  • Extended Comprehensive Study of Association Measures for Fault Localization, by Lucia, David LO, Lingxiao JIANG, Ferdian THUNG, and Aditya BUDI. In Journal of Software: Evolution and Process, 2013. [To appear in print; online version available on Wiley]. It is an extended version of this conference version: Comprehensive Evaluation of Association Measures for Fault Localization, by Lucia, David LO, Lingxiao JIANG, and Aditya BUDI. In the proceedings of the 26th IEEE International Conference on Software Maintenance (ICSM '10), Timisoara, Romania, 2010. [pdf, dataset, on IEEE Xplore]
  • Empirical Evaluation of Bug Linking, by Tegawende F. BISSYANDE, Ferdian THUNG, Shaowei WANG, David LO, Lingxiao JIANG, and Laurent REVEILLERE. In the proceedings of the 17th European Conference on Software Maintenance and Reengineering (CSMR '13), Genova, Italy, 2013. [on IEEE Xplore, pdf]
  • An Empirical Study of Bugs in Machine Learning Systems, by Ferdian THUNG, Shaowei WANG, David LO, and Lingxiao JIANG. In the proceedings of the 23rd IEEE International Symposium on Software Reliability Engineering (ISSRE '12), Dallas, Texas, USA, 2012. [on IEEE Xplore, pdf]
  • Automatic Defect Categorization, by Ferdian THUNG, David LO, and Lingxiao JIANG. In the proceedings of the 19th Working Conference on Reverse Engineering (WCRE '12), Kingston, Ontario, Canada, 2012. [pdf, on IEEE Xplore]
  • When Would This Bug Get Reported? By Ferdian THUNG, David LO, Lingxiao JIANG, Lucia, Foyzur RAHMAN, and Prem DEVANBU. In the proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM '12), Riva del Garda, Trento, Italy, 2012. [pdf, on IEEE Xplore]
  • Interactive Fault Localization By Leveraging Simple User Feedbacks, by Liang GONG, David LO, Lingxiao JIANG, and Hongyu ZHANG. In the proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM '12), Riva del Garda, Trento, Italy, 2012. [pdf, on IEEE Xplore]
  • Diversity Maximization Speedup for Fault Localization, by Liang GONG, David LO, Lingxiao JIANG, and Hongyu ZHANG. In the proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE '12), Essen, Germany, 2012. [pdf, on ACM DL]
  • Are Faults Localizable? By Lucia, Ferdian THUNG, David LO, and Lingxiao JIANG. In the proceedings of the 9th Working Conference on Mining Software Repositories (MSR '12), Zurich, Switzerland, 2012. [pdf, on IEEE Xplore]
  • Search-Based Fault Localization, by Shaowei WANG, David LO, Lingxiao JIANG, Lucia, and Hoong Chuin LAU. In the proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE '11), Lawrence, Kansas, USA, 2011. [pdf, slides.pdf, on IEEE Xplore]
  • Context-Aware Statistical Debugging: From Bug Predictors to Faulty Control Flow Paths, by Lingxiao JIANG and Zhendong SU. In the proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE '07), Atlanta, Georgia, USA, 2007. [PDF from ACM DL ACM DL Author-ize service, on ACM DL, pdf, slides.pdf]

Optimization and Quality Assurance

  • The Case for Mobile Forensics of Private Data Leaks: Towards Large-Scale User-Oriented Privacy Protection, by Joseph CHAN Joo Keng, TAN Kiat Wee, Lingxiao JIANG, and Rajesh Krishna BALAN. In the proceedings of the 4th Asia-Pacific Workshop on Systems (APSYS '13), Singapore, 2013. [on ACM DL, pdf]
  • To What Extent Could We Detect Field Defects? An Empirical Study of False Negatives in Static Bug Finding Tools, by Ferdian THUNG, Lucia, David LO, Lingxiao JIANG, Foyzur RAHMAN, and Prem DEVANBU. In the proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE '12), Essen, Germany, 2012. [pdf, on ACM DL]
  • Real-time Trip Information Service For A Large Taxi Fleet, by Rajesh Krishna BALAN, Khoa Xuan NGUYEN, and Lingxiao JIANG. In the proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys '11), Washington, DC, USA, 2011. [PDF from ACM DL ACM DL Author-ize service, on ACM DL, pdf]
  • Static Validation of C Preprocessor Macros, by Andreas SAEBJOERNSEN, Lingxiao JIANG, Daniel QUINLAN, and Zhendong SU. In the proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering (ASE '09), Auckland, New Zealand, 2009. [pdf, on IEEE Xplore and ACM DL]
  • Osprey: A Practical Type System for Validating Dimensional Unit Correctness of C Programs, by Lingxiao JIANG and Zhendong SU. In the proceedings of the 28th International Conference on Software Engineering (ICSE '06), Shanghai, China, 2006. [PDF from ACM DL ACM DL Author-ize service, on ACM DL, pdf, slides.pdf]
Spacer
Spacer