I. INTRODUCTION
Mobile devices, such as smartphones and music players, have recently begun to incorporate diverse and powerful sensors. These sensors include acceleration sensor, magnetic field sensor, light sensor, proximity sensor, gyroscope sensor, pressure sensor, rotation vector sensor, gravity sensor and orientation sensor. Because of the small size of these “smart” mobile devices, their substantial computing power, their ability to send and receive data, and their nearly ubiquitous use in our society, these devices open up exciting new areas for data mining research and data mining applications. In this paper, we explore the use of one of these sensors, the accelerometer, in order to identify the activity that a user is performing—a task that we refer to as activity recognition.
Android phones, as well as virtually all new smartphones and smart music players, contain tri-axial accelerometers that measure acceleration in all three spatial dimensions. These accelerometers are also capable of detecting the orientation of the device (helped by the fact that they can detect the direction of Earth’s gravity), which can provide useful information for activity recognition. Accelerometers were initially included in these devices to support advanced game play[1] and to enable automatic screen rotation[2] but they clearly have many other applications. In fact, there are many useful applications that can be built if accelerometers can be used to recognize a user’s activity. For example, we can automatically monitor a user’s activity level and generate daily, weekly, and monthly activity reports, which can be automatically emailed to the user. These reports would indicate an overall activity level, which could be used to gauge if the user was getting an adequate amount of exercise and estimate the number of daily calories expended. These reports could be used to encourage healthy practices and might alert some users to how sedentary they or their children actually are. The activity information can also be used to automatically customize the behavior of the mobile phone. For example, music could automatically be selected to match the activity or send calls directly to voicemail when the user is exercising.
In order to address the activity recognition task using supervised learning, we first collected accelerometer data from 10 users as they performed the activities such as “phone detached”, “idle”, “walking”, “running”, and “jumping”. Here, “idle” refers to either “sitting” or “standing”, meaning no exercise. We then aggregated this raw time series accelerometer data into examples (or episodes), where each example is labeled with the activity that occurred while that the was being collected. We then built predictive models for the activity recognition using classification algorithms. The topic of the accelerometer-based activity recognition is not new. Bao & Intille [5] developed an activity recognition system to identify twenty activities using bi-axial accelerometers placed in five locations on the user’s body. Additional studies have similarly focused on how one can use a variety of accelerometer based devices to identify a range of user activities. The other work has focused on the applications that can be built based on the accelerometer-based activity recognition. This work includes identifying a user’s activity level and predicting their energy consumption [8], detecting a fall and the movements of user after the fall [15], and monitoring user activity levels in order to promote health and fitness [16].
Our work differs from most prior work in that we use a commercial mass-marketed device rather than a research-only device, we use a single device conveniently kept in the user’s pocket rather than multiple devices distributed across the body, and we require no additional actions by the user. Our work makes several additional contributions. One contribution is the data that we have collected and continued to collect, which we plan to make public in the future. This data can serve as a resource to other researchers, since we were unable to find such publically available data ourselves. We also demonstrate how raw time series accelerometer data can be transformed into examples that can be used by conventional classification algorithms. We demonstrate that it is possible to perform activity recognition with commonly available (nearly ubiquitous) equipment and yet achieve highly accurate results. Finally, we believe that our work will help bring attention to the opportunities available for mining wireless sensor data and will stimulate additional work in this area.
The remainder of this paper is structured as follows. Related work is described in Section 2, Section 3 describes the process for addressing the activity recognition task, including data collection, data preprocessing, and data transformation. Section 4 describes our experiments and results. Section 5 summarizes our conclusions and discusses the areas for the future research.
II. RELATED WORKS
Activity recognition has recently gained attention as a research topic because of the increasing availability of accelerometers in consumer products, like smartphones, and because of the many potential applications. Some of the earliest work in accelerometer based activity recognition focused on the use of multiple accelerometers placed on several parts of the user’s body. In one of the earliest studies of this topic, Bao & Intille [5] used five biaxial accelerometers worn on the user’s right hip, dominant wrist, non-dominant upper arm, dominant ankle, and non-dominant thigh in order to collect data from 20 users. Using decision tables, instance-based learning, Naive Bayes classifiers, they created models to recognize twenty daily activities. Their results indicated that the accelerometer placed on the thigh was the most powerful for distinguishing between activities. This finding supports our decision to have our test subjects carry the phone in the most convenient location—their pants pocket.
Some studies have also focused on combining multiple types of sensors in addition to accelerometers for activity recognition. Maurer et al. [18] used “eWatch” devices placed on the belt, shirt pocket, trouser pocket, backpack, and neck to recognize the same six activities that we consider in our study. Each “eWatch” is consisted of a biaxial accelerometer and a light sensor. Decision trees, k-Nearest Neighbor, Naïve Bayes, and Bayes Net classifiers with five-fold cross validation were used for learning. Choudhury et. al [19] used a multimodal sensor device consisting of seven different types of sensors (tri-axial accelerometer, microphone, visible light phototransitor, barometer, visible+IR light sensor, humidity/temperature reader, and compass) to recognize activities such as walking, sitting, standing, ascending stairs, descending stairs, elevator moving up and down, and brushing one’s teeth. Cho et. al. [20] used a single tri-axial accelerometer, along with an embedded image sensor worn at the user’s waist, to identify nine activities. Although these multi-sensor approaches do indicate the great potential of mobile sensor data as more types of sensors are being incorporated into devices, our approach shows that only one type of sensor—an accelerometer—is needed to recognize most daily activities. Thus our method offers a straightforward and easily-implementable approach to accomplish this task.
A few studies, like ours, did use an actual commercial mobile device to collect data for activity recognition. Such systems offer an advantage over other accelerometer-based systems because they are unobtrusive and do not require any additional equipment for data collection and accurate recognition. Miluzzo et. al. [21] explored the use of various sensors (such as a microphone, accelerometer, GPS, and camera) available on commercial smartphones for activity recognition and mobile social networking applications. In order to address the activity recognition task, they collected the accelerometer data from ten users to build an activity recognition model for walking, running, sitting, and standing. This model had particular difficulty distinguishing between the states of sitting and of standing activities, a task that our models easily achieve. Yang [22] developed an activity recognition system using the Nokia N95 phone to distinguish among sitting, standing, walking, running, driving, and bicycling. This work also explored the use of an activity recognition model to construct physical activity diaries for the users. Although the study achieved relatively high accuracies of prediction, stair climbing was not considered and the system was trained and tested using data from only four users. Brezmes et. al. [6] also used the Nokia N95 phone to develop a real-time system for recognizing 10 user activities. In their systems, an activity recognition model is trained for each user, meaning that there is no universal model that can be applied to new users, for whom no training data exists. Our models do not have this limitation.
III. THE ACTIVITY RECOGNITION TASK
In this section, we describe the activity recognition task and the process for performing this task. In Section 3.1 we describe our protocol for collecting the raw accelerometer data, in Section 3.2 we describe how we preprocess and transform the raw data into examples, and in Section 3.3 we describe the activities that will be predicted/identified.
In order to collect data for our supervised learning task, it was necessary to have some users carry an Android-based smart phone while performing designated activities. We enlisted the help of 10 volunteer’s subjects to carry a smartphone while performing a specific set of activities. These subjects carried the Android phone in the front pockets of their pants and were asked to “phone detached”, “idle”, “walking”, “running”, and “jumping” for specific periods of time.
The data collection was controlled by an application we created and executed on the phone. This application, through a simple graphical user interface, permitted us to record the user’s name, start and stop the data collection, and label the activity being performed. The application permitted us to control what sensor data (e.g. accelerometer, gyroscope) was collected and how frequently it was collected. In all cases we collected the accelerometer data every 20 milliseconds, so we had 50 samples per second.
Standard classification algorithms cannot be directly applied to raw time-series accelerometer data. Instead, we first must transform the raw time series data into examples [17]. Note that a sample window of 512 samples was used. 512 samples were chosen based on the recommendation of prior papers, because it was deemed sufficient for the activity recognition and because 512 is a number of 2n, which is ideal for the Fast Fourier Transformation algorithm. In terms of a sliding window, we use the window, the size of which is 100 samples.
The following features were chosen based on our own discussion and the recommendations of previous accelerometer based activity recognition research:
-
Fundamental Frequencies: The fundamental frequencies of the signal in the window.
-
Maximum Amplitude: The maximum value of the signal in the window.
-
Minimum Amplitude: The minimum value of the signal in the window.
-
Intensity: The intensity value of the signal in the window.
-
Fundamental Frequencies Magnitude: The fundamental frequencies magnitude of the signal in the window.
-
Step Num: The step num value of the signal in the window.
-
Position Relation: Placement direction of the smartphone .
How can we get the fundamental frequencies of the signal? These were found from the Fourier Transformation of the signal over the sample window. The final value was the average of the three dominant frequencies of the signal. Figure 1 is the frequency spectrum acquired.
The “Step Num” feature requires further explanation. The different persons have a different walking (or running) speed. For example, a woman’s running speed is probably equivalent to man’s fast walking speed. But how can distinguish walking and running? The step number is a way. No matter your speed is fast or slow, walking’s step number always far less than running. Therefore, we can use the step number to distinguish walking and running. Figure 2 is the step number (each peak is each step), the left side is the walking data, and the right side is the running data. You can see the walking’s step number far less than the running step.
The “Position Relation” feature requires further explanation, too. No matter your smartphone is a phone detached or idle, the both acceleration data has a bit wave. But how can distinguish phone detached and idle? The position relation is a way. If your smartphone is the phone detached, it always lies on the flat ground. But if your telephone is the idle, it is always vertical or oblique nearby your body. Therefore, we can use the smartphone’s orientation to distinguish phone detached and idle. Figure 3 is the position relation (blue curve represents x axis, green curve represents y axis, red curve represents z axis), the left side is the phone detached data, the right side is the idle data. You can see the obvious difference of the orientation between phone detached and idle.
In this study, we consider five activities: phone detached, idle, walking, running, and jumping. We selected these activities because they are performed regularly by many people in their daily routines. The activities also involve motions that often occur for the substantial time periods, thus making them easier to recognize. Furthermore, most of these activities involve repetitive motions and we believe that this should also make the activities easier to recognize. When we record data for each of these activities, we record acceleration in three axes.
Figure 4 plots the accelerometer data for a typical user, for seven features of continuous five activities (each activity lasts 30 seconds, entire five activities are 150 seconds in all). It is clear that static activities (phone detached, idle) and dynamic activities (walking, running, jumping), showed on accelerometer maximum amplitude and minimum amplitude, accelerometer intensity, accelerometer fundamental frequencies magnitude and the step number.
About distinguishing different dynamic activities, you can see the accelerometer intensity and accelerometer fundamental frequencies magnitude. It is clear to distinguish the walking, running and jumping.
If you see the position relation to distinguish different static activities, it is also clear to distinguish the phone detached and non-detached.
During the classification stage, we labeled unknown patterns of the data based on the knowledge acquired from known patterns of data. For the implementation of the activity recognition application, we implemented a k-nearest neighbor classifier on the data. The k-nearest neighbor algorithm is a simple algorithm which matched unknown patterns of the data with a known pattern of the data based on the Euclidean distance between the two feature vectors. The known pattern of data with the lowest Euclidean distance from the unknown pattern of the data was determined to be the best match.
When given an unknown tuple, a k-nearest-neighbor classifier searches the pattern space for the k training tuples that are the closest to the unknown tuple. These k training tuples are the k “nearest neighbors” of the unknown tuple. “Closeness” is defined in terms of a distance metric, such as Euclidean distance. The Euclidean distance between two points or tuples, say, X1 = (x11, x12, …, x1n) and X2 = (x21, x22, …, x2n), is
Typically, we normalize the values of each attribute before using Eq. (1). This helps prevent attributes with initial large ranges (e.g., intensity) from outweighing attributes with initial smaller ranges (e.g., position relation).Min-max normalization, for example, can be used to transform a value v of a numeric attribute A to v′ in the range [0, 1] by computing
where minA and maxA are the minimum and maximum values of attribute A.
“How can I determine a good value for k, the number of neighbors?” This can be determined experimentally. Starting with k=1, we use a test set to estimate the error rate of the classifier. This process can be repeated each time by incrementing k to allow for one more neighbor. The k value that gives the minimum error rate may be selected. For our experiments, the best k value equals to 2.
IV. EXPERIMENTS
In this section, we describe our experiments and then present and discuss our results for the activity recognition task.
Our experiments, first, require us to collect the labeled raw accelerometer data and then transform that data into examples. This process was described in Section 3. The resulting examples contain 7 features and cover 10 users. This forms the data set, which is subsequently used for training and testing.
The whole testing process was on the playground. At first, certain person gathered continuous standard data that each activity is performed for 30 seconds, entire five activities take 150 seconds in all. It is used as training data. By another 10 persons, it then gathered continuous test data which demands 1 minute per each activity, and entire five activities require 5 minutes in all. It is used as test data.
Once the raw data was gathered, first, we preprocess the data which includes removing the repeated data, linearizing and smoothing the data. After the data set was prepared, we used k-Nearest-Neighbor Classifiers algorithm to classify the data. Our experiment shows that the k-value set to 2 is the best.
Figure5 shows the training raw data and the activity classification which is known. These data are classified as one of the standardized five states from “detached” to “jumping”. Here, we set their values from 1 to 5 which are the five activity types. And we let these training data serve as basic data, which are later used when they are compared with test data acquired.
The summary results for our activity recognition experiments are presented in Table 1. According to training data and k-Nearest-Neighbor Classifiers algorithm, we predict the classification results of test data. After that, we compare with predictive results and realistic results, thereby obtaining the accuracy of the classification method.
We describe the table’s contents which is the activity recognition in detail at below:
-
Phone Detached: The user is not currently holding the phone.
-
Idle: The user is holding the phone in an idle state.
-
Walking: The user is walking.
-
Running: The user is running.
-
Jumping: The user is jumping.
‘✓‘ means the activity can be identified accurately, ‘✗‘ means the activity cannot be identified accurately. For example, ‘S1’ can be identified accurately four activities which are “Phone Detached”, “Idle”, “Walking”, and “Jumping”, but it cannot be identified accurately “Running”. When he was running, the experimental results indicate that he was jumping.
Based on the results, we can exactly distinguish the phone detached, idle, walking and running. However, the algorithm cannot always distinguish running and jumping. We describe the results in detail as follows:
About how to distinguish running and jumping, we plan to implement this correction in future work
V. CONCLUSIONS
We described how a smart phone can be used to perform activity recognition, simply by keeping it in one’s pocket. The method that we used can successfully get exact classified result except for distinguishing between running and jumping.
We plan to improve our activity recognition in several ways. The straightforward improvements involve: 1) Learning to recognize additional activities, such as descending stairs and climbing stairs. 2) Obtaining training data from more users with the expectation that will improve our results. 3) Generating additional and more sophisticated features when aggregating the raw time-series data. 4) Evaluating the impact of carrying the smartphone in different locations, such as on a belt loop.
The work described in this paper is part of a larger effort to mine sensor data from wireless devices. We plan to continue our project, applying the accelerometer data to other tasks besides the activity recognition and collecting and mining other sensor data, especially GPS data. We believe that mobile sensor data provides tremendous opportunities for data mining and we intend to leverage our Android-based the data collection/data mining platform to the fullest extent.