Agenda

[Note: the location for the CSER meeting is Willow III anmd IV on the lower level of the Hilton Suites Toronto/Markham.]

7:30-8:30 Board Meeting

8:00-8:30 Continental Breakfast

8:30 Welcome and Introduction

8:30-9:50 Keynote Session 1
Session Chair: Jeremy Bradbury

9:50-10:30 Presentation Session 1 - Models
Session Chair: Kostas Kontogiannis

10:30-11:00 Break

11:00-12:20 Presentation Session 2 - The Web
Session Chair: Steve Easterbrook

12:20-13:20 Lunch

13:20-14:30 Keynote Session 2
Session Chair: Chanchal Roy

14:30-15:30 Presentation Session 3 - Defects
Session Chair: Abram Hindle

15:30-16:00 Break

16:00-17:40 Presentation Session 4 - Search and More
Session Chair: Hausi Muller

17:40-17:50

Wrap-up 18:00 Poster Session and Reception
Poster Session Organizers: Kevin Jalbert and David Kelk

 

Presentation Abstracts

On the Relationship Between Earth System Models and the Labs That Build Them
Steve Easterbrook, University of Toronto
Abstract: In this talk I will discuss a number of observations from a comparative study of four major climate modeling centres:

The study focussed on the organizational structures and working practices at each centre with respect to earth system model development, and how these affect the history and current qualities of their models. While the centres share a number of similarities, including a growing role for software specialists and greater use of open source tools for managing code and the testing process, there are marked differences in how the different centres are funded, in their organizational structure and in how they allocate resources. These differences are reflected in the program code in a number of ways, including the nature of the coupling between model components, the portability of the code, and (potentially) the quality of the program code.
While all these modelling centres continually seek to refine their software development practices and the software quality of their models, they all struggle to manage the growth (in terms of size and complexity) in the models. Our study suggests that improvements to the software engineering practices at the centres have to take account of differing organizational constraints at each centre. Hence, there is unlikely to be a single set of best practices that work anywhere. Indeed, improvement in modelling practices usually come from local, grass-roots initiatives, in which new tools and techniques are adapted to suit the context at a particular centre. We suggest therefore that there is need for a stronger shared culture of describing current model development practices and sharing lessons learnt, to facilitate local adoption and adaptation.

A Comparative Study of the Performance of IR Models on Duplicate Bug Detection
Nilam Kaushik and Ladan Tahvildari, University of Waterloo
Abstract: Open source projects incorporate bug triagers to help with the task of bug report assignment to developers. One of the tasks of a triager is to identify whether an incoming bug report is a duplicate of a pre-existing report. In order to detect duplicate bug reports, a triager either relies on his memory and experience or on the searching functionality of the bug repository. Both these approaches can be time consuming for the triager and may also lead to misidentification of duplicates. In this presentation, we compare the performance of 8 IR models, leveraging heuristics such as stack frames and the operational evnvironment information from the free-form text in the bug reports. We perform experiments on bug reports from Eclipse and Firefox and achieve a recall rate of 60% and 57% respectively with the optimal set of parameters. We find that a Log-Entropy based model outperforms all the other models. Based on the findings from the two case studies, we propose an online framework to simulate duplicate bug report detection using a year's worth of bug report data from the Eclipse's Platform project.

SmarterContext: Managing Dynamic Context to Smarten-up User-centric Web Applications
Norha M. Villegas and Hausi A. Muller, University of Victoria
Abstract: Most web applications deliver personalized features by making decisions on behalf of the user. Thus, the user's web experience is still a fragmented process due to a lack of user-centric web integration. In contrast, smarter web applications will empower the user to control the integration of web resources according to personal concerns. Moreover, as the user's interests and web resources continuously evolve, web infrastructures supporting smarter applications require dynamic and efficient mechanisms to represent, gather, provide, and reason about the context information that is relevant to the user. In this talk we will present SmarterContext, our innovative approach to dynamic context management that exploits feedback loops and semantic web technologies to optimize context-awareness in user-centric web applications. Using a smarter ecommerce scenario, we will illustrate how SmarterContext aims at optimizing context-aware user centric shopping experiences, by empowering the user to manage her context information in a transparent way.

An Empirical Study on Web Service Evolution
Marios Fokaefs, University of Alberta
Abstract: The service-oriented architecture paradigm prescribes the development of systems through the composition of services, i.e., network-accessible components, specified by (and invoked through) their WSDL interface descriptions. Systems thus developed need to be aware of changes in, and evolve with, their constituent services. Therefore, accurate recognition of changes in the WSDL specification of a service is an essential functionality in the context of the software life cycle of service-oriented systems.
In this work, we present the results of an empirical study on WSDL evolution analysis. In the first part, we empirically study whether VTracker, our algorithm for XML differencing, can precisely recognize changes in WSDL documents by applying it to the task of comparing 18 versions of the Amazon EC2 web service. Second, we analyze the changes that occurred between the subsequent versions of various web-services and discuss their potential effects on the maintainability of service systems relying on them.

Web Service Assurance: Notion and the Issues
Atousa Pahlevan and Hausi A. Muller, University of Victoria
Abstract: Web service technology is at the basis of deploying collaborative business processes. Web Services security standards and protocols aim to provide secure communication and conversation between service providers and consumers. Still, for a client calling a Web Service it is difficult to be sure that a particular service instance holds, at execution time, some specific non-functional properties. In this talk we introduce the notion of certified Web service assurance, describing how service consumers can specify the set of security properties that a service should hold. Also, we illustrate a mechanism to re-check non-functional properties when the execution context changes. To this end, we introduce the concept of context-aware certificate, and describe a dynamic, contextaware service certification environment.

YaKit: A Locality Based Messaging System
Przemek Lach, University of Victoria
Abstract: A new approach to building localized, context-driven social networking applications that allow people to communicate, interact, collaborate, and socialize in a truly innovative manner. In particular, the goal is to provide mechanisms to form communities of people who do not necessarily know each other but are in close proximity to each other.

An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mozilla Firefox
Foutse Khomh, Brian Chan, Ying Zou, Ahmed E. Hassan, Queen's University
Abstract: A crash is an unexpected termination of an application during normal execution. Crash reports record stack traces and run-time information once a crash occurs. A group of similar crash reports represents a crash-type. The triaging of crash-types is critical to shorten the development and maintenance process. Crash triaging process decides the priority of crash-types to be fixed. The decision typically depends on many factors, such as the impact of the crash-type, (i.e., its severity), the frequency of occurring, and the effort required to implement a fix for the crash-type. In this talk, I will present a new triaging method based on a concept of entropy region graphs. An entropy region graph captures the distribution of the occurrences of crash-types among the users of a system. I will also present the results of an empirical study on crash reports and bugs, collected from 10 beta releases of Firefox 4. The results show that our proposed triaging technique enables a better classification of crash-types than the current triaging used by Firefox teams. Developers and managers could use such a technique to prioritize crash-types during triage, to estimate developer workloads, and to decide which crash-types patches should be included in a next release.

ARC: Automatic Repair of Concurrency Bugs
David Kelk, Kevin Jalbert and Jeremy Bradbury, University of Ontario Institute of Technology
Abstract: Concurrent software bugs appear intermittently due to the non-deterministic nature of how threads might be scheduled to run. A concurrency bug can be fixed by using known concurrency mechanisms to ensure proper thread scheduling. ARC takes advantage of this fact by evolving a buggy program using known concurrency mutation operators in attempts to ensure proper program execution. An evolved program is evaluated using ConTest by repeatedly exploring thread interleavings to provide a certain level of confidence that the majority of schedules have been explored. A fitness function will evaluated the functional correctness of the evolved program. The evaluation feedback from ConTest is used to heuristically select the most appropriate mutation operator to next apply. A second phase of evolution is applied after an appropriate level of functional fitness is achieved to optimize the non-functional fitness. This phase will minimize the usage of synchronization mechanisms to ensure timely execution of the evolved program while retaining the functional correctness.

High Impact Defects: A Study on Breakage and Surprise Defects
Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams and Ahmed E. Hassan, Queen's University
Abstract: The relationship between various software-related phenomena (e.g., code complexity) and post-release software defects has been thoroughly examined. However, to date these predictions have a limited adoption in practice. The most commonly cited reason is that the prediction identifies too much code to review without distinguishing the impact of these defects. Our aim is to address this drawback by focusing on high-impact defects for customers and practitioners. Customers are highly impacted by defects that break pre-existing functionality (breakage defects), whereas practitioners are caught off-guard by defects in files that had relatively few pre-release changes (surprise defects). The large commercial software system that we study already had an established concept of breakages as the highest-impact defects, however, the concept of surprises is novel and not as well established. We find that surprise defects are related to incomplete requirements and that the common assumption that a fix is caused by a previous change does not hold in this project. We then fit prediction models that are effective at identifying files containing breakages and surprises. The number of pre-release defects and file size are good indicators of breakages, whereas the number of co-changed files and the amount of time between the latest pre-release change and the release date are good indicators of surprises.

Source Code Search: The Difficulties in Achieving Practical Impact Despite Academic Success
Jamie Starke, University of Victoria
Abstract: When performing research, choosing and understanding the context of that research is vitally important. While selecting a specific context might help you bring a success in one area, it can lead to difficulties in another area. This talk will present earlier work, which could be considered a success in academia, but which could be considered less successful when considering the practical, industrial impact, and how the original choice of context ultimately led to these results.

Opportunities in Source Code Clone Search and Detection
Iman Keivanloo and Juergen Rilling, University of Concordia
Abstract: Code clone search is an emerging family of clone detection research that aims at finding clone pairs matching an input code fragment at run-time. For these techniques to meet actual real world requirements, they have to be scalable, provide a short response time, and allow for scalable incremental corpus updates, while detecting type-1, type-2, and type-3 clones. We discuss on the importance of understanding the statistical characteristics of source code as the preliminary step. Then, we show how the earlier step could help us to achieve these requirements in both theory and practice. At the end, it will be shown, how sharing and being online could ease research in software analysis domain by reviewing concrete example.

QoS-CARE: A Framework for Reliable QoS Contract Preservation through Self-Reconfiguration
Gabriel Tamura, Inria Lille Nord Europe, France and University of Victoria
Abstract: The ever increasing pervasiveness of ubiquitous computing devices in the society demands highly dynamic capabilities on software to satisfy context-dependent requirements. In the last years, the engineering of self-adaptive software has achieved significant advances for supporting these capabilities. However, self-adaptation is still constrained by statically produced adaptation plans and a lack of standardized measures that limits comparative analysis on adaptation properties. In this talk I will present a framework that enables component-based applications to preserve QoS contracts using a formal model for self-reconfiguration. We evaluate our framework on a set of experiments performed on FraSCAti, an SCA-compliant implementation, show its practical feasibility and applicability.

Analyzing the Collaborative Software Process on Jazz
Fabio Rocha, Eleni Stroulia and Nikolaos Tsantalis, University of Alberta
Abstract: In this research talk, we have integrated into RTC a sophisticated analysis of the collaborative development process offered by Jazz in order to provide a better and deeper understanding of the artifacts, people and their relations. More specifically, we offer three main services: software evolution analysis at design level, individual contribution and social network analysis and analysis of natural-language artifacts.

Towards a Training Oriented Adaptive Decision Support System
Farhana Zulkernine, Pat Martin, S. Soltani, and Wendy Powley, Queen's University
S. Mankovskii and M. Addleman from CA Technologies
Abstract: Decisions are made based on knowledge. We present a framework that dynamically extracts knowledge from various correlated data sources containing systems related data and from the problem solving strategies of the expert Mainframe DB2 Database Administrators (DBA). The framework then uses the knowledge to train the new generation of DBAs by guiding them through the various stages of solving DB2 problems on the Mainframe system. The research combines text and data mining techniques for knowledge extraction, a rule-based system for knowledge representation and problem categorization, and a case-based system for decision support. The framework provides an interactive interface to accept user preferences, which is used to adapt the rule and case-bases. Rules are extracted from various log data sources and monitoring data in the data warehouses. The rule-based system guides the user through the initial trivial steps of problem investigation and categorization, and helps collect more information about the problem. Finally the case-based system is searched for a suitable solution.

Poster Abstracts

Architecture of Debian Packages
Raymond Nguyen, University of Waterloo
Abstract: Debian is a Linux based operating system which contains a PMS (Package Management System) to help administer its 29,000+ software packages. Each package provides a particular service or application. In our work, we ask: How can these be organized to better understand them and their usage? Although some packages stand alone, they generally depend upon other packages to carry out their function. These dependencies are specified in Debian by metadata that can help us identify the structure of applications. To make more sense of how the packages interact with each other, we examine these packages and their dependencies to classify them into architectural patterns. For example, packages that stand alone are considered to have the "singleton" style, while large complicated configurations are considered to have the "meta-package" style. With the rapid growth of Debian, we hope this approach provides a clearer picture the overall structure of the packages, as well as their local configurations.

SmarterContext: On the Optimization of the User's Shopping Experience
Norha M. Villegas and Hausi A. Muller, University of Victoria
Juan C. Munoz, Icesi University, Colombia
Abstract: As users' interests and web entities continuously evolve, smart user-centric web applications must keep track of changing context information to deliver services and content accordingly. This poster presents a smarter e-commerce scenario where SmarterContext, our approach to dynamic context management, exploits feedback loops and semantic web technologies to optimize context-aware user-centric shopping experiences.

YaKit: A Locality based Messenging System using the iCon Overlay
Ron Desmarais, Przemek Lach and Hausi A. Muller, University of Victoria
Abstract: With the emergence of smart devices along with cloud computing ability to dynamically provision resources and services as needed, new computing paradigms provide a rich environment for researchers. To investigate this, YaKit and ICON was developed as a new type of social application. These are a first attempt at taking advantage of the wealth of knowledge and capability in providing a smarter web experience for users. This poster will present YaKit, a new social tool that takes advantage of both locality and time to facilitate communication, along with ICON a back-end service to manage cloud-based resource provisioning of YaKit components. Data structure and algorithms are discussed along with the overall idea for the deployment architecture.

Web Service Assurance: Notion and the Issues
Atousa Pahlevan and Hausi A. Muller, University of Victoria
Abstract: Web service technol Mulleogy is at the basis of deploying collaborative business processes. Web Services security standards and protocols aim to provide secure communication and conversation between service providers and consumers. Still, for a client calling a Web Service it is difficult to be sure that a particular service instance holds, at execution time, some specific non-functional properties. In talk we introduce the notion of certified Web service assurance, describing how service consumers can specify the set of security properties that a service should hold. Also, we illustrate a mechanism to re-check non-functional properties when the execution context changes. To this end, we introduce the concept of context-aware certificate, and describe a dynamic, context-aware service certification environment.

Eclipticon: Eclipse Plugin for Concurrency Testing
Kevin Jalbert, Cody LeBlanc, Jeremy S. Bradbury, Ramiro Liscano, University of Ontario Institute of Technology
Christopher Forbes, Concordia University
Abstract: Testing concurrent software is a challenging endeavor due to the non-deterministic nature of threaded software. There are two main techniques for properly testing concurrent software. Model checking is sometimes impractical due to resource constraints. The alternative technique is to perform thread exploration by inserting random thread delays and yields into the software. The delays make it possible to explore more of the thread interleaving space through repeated execution of a testsuite. Eclipticon is a plugin built for the Eclipse IDE. The plugin gives uses two levels of control over the placement of thread delays and yields. Coarse control level allows for files to be targeted with a percentage of the applicable locations to be randomly instrumented. Fine control level allows specific areas within files to be targeted. The two levels of control, when combined with a testing approach, allow users to perform targeted testing for concurrency bugs on problematic areas.

ARC: Automatic Repair of Java Concurrency Bugs
Kevin Jalbert, David Kelk and Jeremy S. Bradbury, University of Ontario Institute of Technology
Abstract: Concurrent software bugs appear intermittently due to the non-deterministic nature of how threads might be scheduled to run. A concurrency bug can be fixed by using known concurrency mechanisms to ensure proper thread scheduling. ARC takes advantage of this fact by evolving a buggy program using known concurrency mutation operators in attempts to ensure proper program execution. An evolved program is evaluated using ConTest by repeatedly exploring thread interleavings to provide a certain level of confidence that the majority of schedules have been explored. A fitness function will evaluated the functional correctness of the evolved program. The evaluation feedback from ConTest is used to heuristically select the most appropriate mutation operator to next apply. A second phase of evolution is applied after an appropriate level of functional fitness is achieved to optimize the non-functional fitness. This phase will minimize the usage of synchronizationmechanisms to ensure timely execution of the evolved program while retaining the functional correctness.

Integration of Component-Based Frameworks with Sensor Web Languages
John K. Jacoub and Ramiro Liscano, University of Ontario Institute of Technology
Abstract: The objective is to facilitate software development by leveraging explicit specification of sensor systems. To achieve this goal a Translator was developed that creates Java-bean components and their corresponding connections from SensorML document.

A Text Mining Approach for Test Case Selection
Stephen W. Thomas, Hadi Hemmati, Ahmed E. Hassan, and Dorothea Blostein, Queen's University
Abstract: Test suites typically grow over time. In large-scale systems, this results in very large test suites, which are impossible to be fully executed within project time and budget constraints. For these systems, developers need to execute a subset of the test cases while still detecting as many faults as possible. This problem is called test case selection. Recently, researchers have proposed automated approaches for test case selection. Many of these approaches require the run-time behavior or specification models of each test case. However, this information usually is not readily available and obtaining it can be cost prohibitive, especially in legacy systems with thousands of test cases. In these situations, the only available information is the source code text of the test cases themselves, and we desire an approach that is based only on this limited information. In this paper, we compare two text-based test case selection approaches: 1) using the full- length text of the test case and 2) using an abstracted version of the text, created by a text mining approach called topic modeling. In both cases, we use a distance function to measure differences between test cases. We select test cases by maximizing the average distances among selected test cases. We apply both approaches to five real- world systems and find that the abstraction-based approach outperforms the full text-based approach in both fault detection rate and cost savings.

A Framework for Analyzing Software System Log Files
Mei Nagappan, Queen's University
Abstract: Ability to perform large-scale data analytics is fueling collection of a variety of data streams from complex production systems, like the provenance of data, performance statistics, and user statistics, in order to analyze them for specific purposes. Log files are a typical example of such a data stream. Logs are often a record of the execution activities in production systems where the data recorded is intentionally chosen by the developer as useful information. Analyzing log files is an important activity in software engineering, and now in cloud engineering. The data from the log files can be analyzed to guide product related decisions. For example logs can be used to find ordinary and security related problems, define operational profiles, and even pro-actively prevent issues (in support of fault tolerance).
It is the premise of this work that current log analysis methods are too ad hoc and do not scale well enough to be effective in the domain of large logs (such as those we might expect in a computational cloud system). Complex systems have complex and voluminous logs. In this work we investigate, identify and develop components needed for an adaptable end-to-end framework for the analysis of logs. The framework needs to take into consideration that different users look for different kinds of information in the same log files. Required are adaptable techniques and algorithms for efficient and accurate log data collection, log abstraction and log transformations. The techniques or algorithms that are used in each component of the framework will vary according to the application, its logging mechanisms and the information that the stakeholder needs to make decisions.

A Heuristic Technique for Building Resource-Oriented User-Service Interaction
Dwaipayan Sinha, Bipin Upadhyaya, Ying Zou, Queen's University
Abstract: End-users use different applications provided by different Web sites to get a job done. Enabling end-user of service composition allows them to combine all those applications in an easy-to-use fashion that can be distributed among other end-users. Available tools to achieve such a goal still need professional experience in the field. A tool to be used by end-users, who are not programmers, should be simplistic.
The end-users have the perception of services and resources through only the HTML Web pages viewable through browsers. To achieve the service composition ability, an end-user must be capable of identifying the resources, their components, the methods associated with each resource, the parameters used in the methods and the sequence of execution of the involved resources. However, a proper method for identifying resources automatically from HTML Web sites is not yet invented.
Different Web sites offer same information and services, but using different technologies, such as SOAP-based Web services, RESTful services, HTTP-based APIs, Ajax, HTML, XML etc., but all the implementation techniques are encapsulated to the end-user by HTML representation of Web pages. Therefore, we present a novel approach to identify resource types and instances of each type from the HTML pages. We apply static analytical techniques with some heuristics to extract resources and the relations among the identified resources. We represent a Web site as a uniform hierarchical structure of resources. Using similar heuristics, we also identify the state transitions of resources with user interactions by recording and analyzing the called HTTP methods and corresponding parameters during each dealing by the user with the browser from the displayed Web pages.
We have conducted a case study on an e-commerce Web site developed by IBM, and applied our method to identify resources and state transitions. The results from the case study indicate that the combined information of resources identified from Web pages and the resource state transitions effectively helps end-users in service composition.

An Empirical Study on Web Service Evolution
Marios Fokaefs, University of Alberta
Abstract: The service-oriented architecture paradigm prescribes the development of systems through the composition of services, i.e., network-accessible components, specified by (and invoked through) their WSDL interface descriptions. Systems thus developed need to be aware of changes in, and evolve with, their constituent services. Therefore, accurate recognition of changes in the WSDL specification of a service is an essential functionality in the context of the software life cycle of service-oriented systems.
In this work, we present the results of an empirical study on WSDL evolution analysis. In the first part, we empirically study whether VTracker, our algorithm for XML differencing, can precisely recognize changes in WSDL documents by applying it to the task of comparing 18 versions of the Amazon EC2 web service. Second, we analyze the changes that occurred between the subsequent versions of various web-services and discuss their potential effects on the maintainability of service systems relying on them.

Team Reconnaissance in a Software Understanding Landscape
Sean Stevenson, Margaret-Anne (Peggy) Storey, University of Victoria
Abstract:
Software reconnaissance to aid program comprehension and feature location has been the focus of research efforts that have lead to tools being produced to support such a technique in IDEs. Through these tools a developer can gain a much better understanding of the source code in front of them and greatly reduce the learning curve of working on a new project. The problem however is that these tools allow each developer to explore and increase their comprehension of the code alone, separate from their team who may all be doing the same thing. Team reconnaissance is a collaborative approach to using software reconnaissance in program comprehension. The topic requires more research to bridge the gap between program comprehension and computer supported collaborative learning to allow teams to access the software understanding landscape. We will explore how the tool Diver (Dynamic Interactive Views For Reverse Engineering) can be used to offer team support and collaborative features as a software reconnaissance tool.