Enhancing action recognition of construction workers using data-driven scene parsing

Jun Yang

doi:10.3846/jcem.2018.6133

DOI: https://doi.org/10.3846/jcem.2018.6133

Abstract

Vision-based action recognition of construction workers has attracted increasing attention for its diverse applications. Though state-of-the-art performances have been achieved using spatial-temporal features in previous studies, considerable challenges remain in the context of cluttered and dynamic construction sites. Considering that workers actions are closely related to various construction entities, this paper proposes a novel system on enhancing action recognition using semantic information. A data-driven scene parsing method, named label transfer, is adopted to recognize construction entities in the entire scene. A probabilistic model of actions with context is established. Worker actions are first classified using dense trajectories, and then improved by construction object recognition. The experimental results on a comprehensive dataset show that the proposed system outperforms the baseline algorithm by 10.5%. The paper provides a new solution to integrate semantic information globally, other than conventional object detection, which can only depict local context. The proposed system is especially suitable for construction sites, where semantic information is rich from local objects to global surroundings. As compared to other methods using object detection to integrate context information, it is easy to implement, requiring no tedious training or parameter tuning, and is scalable to the number of recognizable objects.

Keyword : worker, action recognition, scene parsing, computer vision, context

How to Cite

Yang, J. (2018). Enhancing action recognition of construction workers using data-driven scene parsing. Journal of Civil Engineering and Management, 24(7), 568-580. https://doi.org/10.3846/jcem.2018.6133

Published in Issue

Nov 19, 2018

Abstract Views

1368

PDF Downloads

959

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Akhavian, R.; Behzadan, A. H. 2015. Construction equipment activity recognition for simulation input modeling using mobile sensors and machine learning classifiers, Advanced Engineering Informatics 29(4): 867–877. https://doi.org/10.1016/j.aei.2015.03.001

Akhavian, R.; Behzadan, A. H. 2016. Smartphone-based construction workers’ activity recognition and classification, Automation in Construction 71: 198–209. https://doi.org/10.1016/j.autcon.2016.08.015

Biederman, I.; Mezzanotte, R. J.; Rabinowitz, J. C. 1982. Scene perception: Detecting and judging objects undergoing relational violations, Cognitive Psychology 14(2): 143–177. https://doi.org/10.1016/0010-0285(82)90007-x

Brilakis, I.; Park, M.; Jog, G. M. 2011. Automated vision tracking of project related entities, Advanced Engineering Informatics 25(4): 713–724. https://doi.org/10.1016/j.aei.2011.01.003

Bugler, M.; Ogunmakin, G.; Teizer, J.; Vela, P. A.; Borrmann, A. 2014. A comprehensive methodology for vision-based progress and activity estimation of excavation processes for productivity assessment, in Proceedings of the 21st International Workshop: Intelligent Computing in Engineering (EG-ICE), 2014, Cardiff, Wales.

Cheng, T.; Venugopal, M.; Teizer, J.; Vela, P. A. 2011. Performance evaluation of ultra wideband technology for construction resource location tracking in harsh environments, Automation in Construction 20(8): 1173–1184. https://doi.org/10.1016/j.autcon.2011.05.001

Cho, D.; Cho, H.; Kim, D. 2014. Automatic data processing system for integrated cost and schedule control of excavation works in NATM tunnels, Journal of Civil Engineering and Management 20(1): 132–141. https://doi.org/10.3846/13923730.2013.801907

CII. (Ed). 2010. IR252.2a – Guide to activity analysis. Construction Industry Institute, Austin, TX, USA [online], [cited 02 March 2018]. Available from Internet: https://www.construction-institute.org/resources/knowledgebase/knowledge-areas/general-cii-information/topics/rt-252/pubs/ir252-2a

Costin, A. M.; Pradhananga, N.; Teizer, J. 2012. Leveraging passive RFID technology for construction resource field mobility and status monitoring in a high-rise renovation project, Automation in Construction 24: 1–15. https://doi.org/10.1016/j.autcon.2012.02.015

Dollar, P.; Rabaud, V.; Cottrell, G.; Belongie, S. 2005. Behavior recognition via sparse spatio-temporal features, in 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, IEEE, 65–72. https://doi.org/10.1109/vspets.2005.1570899

Ding, L.; Fang, W.; Luo, H.; Love, P. E. D.; Zhong, B.; Ouyang, X. 2018. A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long shortterm memory, Automation in Construction 86: 118–124. https://doi.org/10.1016/j.autcon.2017.11.002

Everingham, M.; van Gool, L.; Williams, C.; Winn, J.; Zisserman, A. 2008. Overview and results of the classification challenge, in The PASCAL VOC08 Challenge Workshop, in conj. with ECCV.

Fang, Q.; Li, H.; Luo, X.; Ding, L.; Rose, T. M.; An, W.; Yu, Y. 2018. A deep learning-based method for detecting noncertified work on construction sites, Advanced Engineering Informatics 35: 56–68. https://doi.org/10.1016/j.aei.2018.01.001

Fathi, H.; Dai, F.; Lourakis, M. 2015. Automated as-built 3D reconstruction of civil infrastructure using computer vision: achievements, opportunities, and challenges, Advanced Engineering Informatics 29: 149–161. https://doi.org/10.1016/j.aei.2015.01.012

Gerek, I. H.; Erdis, E.; Mistikoglu, G.; Usmen, M. 2014. Modelling masonry crew productivity using two artificial neural network techniques, Journal of Civil Engineering and Management 21(2): 231–238. https://doi.org/10.3846/13923730.2013.802741

Golparvar-Fard, M.; Heydarian, A.; Niebles, J. C. 2013. Vision-based action recognition of earthmoving equipment using spatio-temporal features and support vector machine classifiers, Advanced Engineering Informatics 27(4): 652–663. https://doi.org/10.1016/j.aei.2013.09.001

Gong, J.; Caldas, C. H. 2011. An object recognition, tracking, and contextual reasoning-based video interpretation method for rapid productivity analysis of construction operations, Automation in Construction 20(8): 1211–1226. https://doi.org/10.1016/j.autcon.2011.05.005

Gong, J.; Caldas, C. H.; Gordon, C. 2011. Learning and classifying actions of construction workers and equipment using bag-of-video-feature-words and Bayesian network models, Advanced Engineering Informatics 25(4): 771–782. https://doi.org/10.1061/41182(416)34

Gouett, M. C.; Haas, C. T.; Goodrum, P. M.; Caldas, C. H. 2011. Activity analysis for direct-work rate improvement in construction, Journal of Construction Engineering and Management 137(12): 1117–1124. https://doi.org/10.1061/(asce)co.1943-7862.0000375

Gupta, A. K.; Kembhavi, A.; Davis, L. S. 2009. Observing human-object interactions: Using spatial and functional compatibility for recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 31(10): 1775–1789. https://doi.org/10.1109/tpami.2009.83

Han, S.; Lee, S.; Pena-Mora, F. 2014. Comparative study of motion features for similarity-based modeling and classification of unsafe actions in construction, Journal of Computing in Civil Engineering 28(5): A4014005. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000339

Herath, S.; Harandi, M. T.; Porikli, F. 2017. Going deeper into action recognition: A survey, Image and Vision Computing 60: 4–21. https://doi.org/10.1016/j.imavis.2017.01.010

Joshua, L.; Varghese, K. 2011. Accelerometer-based activity recognition in construction, Journal of Computing in Civil Engineering 25(5): 370–379. https://doi.org/10.1061/(asce)cp.1943-5487.0000097

Joshua, L.; Varghese, K. 2013. Selection of accelerometer location on bricklayers using decision trees, Computer-Aided Civil and Infrastructure Engineering 28(5): 372–388. https://doi.org/10.1111/mice.12002

Kim, H.; Kim, K.; Kim, H. 2016. Data-driven scene parsing method for recognizing construction site objects in the whole image, Automation in Construction 71: 271–282. https://doi.org/10.1016/j.autcon.2016.08.018

Kim, J. Y.; Caldas, C. H. 2013. Vision-based action recognition in the internal construction site using interactions between worker actions and construction objects, in International Symposium on Automation and Robotics in Construction and Mining, 661–668. https://doi.org/10.22260/isarc2013/0072

Krizhevsky, A.; Sutskever, I.; Hinton, G. E. 2017. ImageNet classification with deep convolutional neural networks, Communications of the ACM 60(6): 84–90. https://doi.org/10.1145/3065386

Laptev, I. 2005. On space-time interest points, International Journal of Computer Vision 64(2/3): 107–123. https://doi.org/10.1109/iccv.2003.1238378

Laptev, I.; Marszalek, M.; Schmid, C.; Rozenfeld, B. 2008. Learning realistic human actions from movies, in International Conference on Computer Vision and Pattern Recognition, 1–8. https://doi.org/10.1109/cvpr.2008.4587756

Liu, C.; Yuen, J.; Torralba, A. 2011a. Nonparametric scene parsing via label transfer, IEEE Transactions on Pattern Analysis and Machine Intelligence 33(12): 2368–2382. https://doi.org/10.1109/tpami.2011.131

Liu, C.; Yuen, J.; Torralba, A. 2011b. SIFT flow: Dense correspondence across scenes and its applications, IEEE Transactions on Pattern Analysis and Machine Intelligence 33(5): 978–994. https://doi.org/10.1109/tpami.2010.147

Luo, X.; Li, H.; Cao, D.; Dai, F.; Seo, J.; Lee, S. 2018. Recognizing diverse construction activities in site images via relevance networks of construction related objects detected by convolutional neural networks, Journal of Computing in Civil Engineering 32(3): 04018012. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000756

Marszalek, M.; Laptev, I.; Schmid, C. 2009. Actions in context, in International Conference on Computer Vision and Pattern Recognition, 2929–2936. https://doi.org/10.1109/CVPR.2009.5206557

Memarzadeh, M.; Golparvarfard, M.; Niebles, J. C. 2013. Automated 2D detection of construction equipment and workers from site video streams using histograms of oriented gradients and colors, Automation in Construction 32: 24–37. https://doi.org/10.1016/j.autcon.2012.12.002

Navon, R.; Goldschmidt, E. 2010. Examination of worker – location measurement methods as a research tool for automated labor control, Journal of Civil Engineering and Management 16(2): 249–256. https://doi.org/10.3846/jcem.2010.29

Oliva, A.; Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision 42(3): 145–175. https://doi.org/10.1023/A:1011139631724

Onofri, L.; Soda, P.; Pechenizkiy, M.; Iannello, G. 2016. A survey on using domain and contextual knowledge for human activity recognition in video streams, Expert Systems with Applications 63: 97–111. https://doi.org/10.1016/j.eswa.2016.06.011

Peddi, A.; Huan, L.; Bai, Y.; Kim, S. 2009. Development of human pose analyzing algorithms for the determination of construction productivity in real-time, in Construction Research Congress, 2009, ASCE, Seattle, WA, USA, 1: 1–20. https://doi.org/10.1061/41020(339)2

Pradhananga, N.; Teizer, J. 2013. Automatic spatiotemporal analysis of construction site equipment operations using GPS data, Automation in Construction 29: 107–122. https://doi.org/10.1016/j.autcon.2012.09.004

Rezazadeh Azar, E.; Mccabe, B. 2012. Part based model and spatial temporal reasoning to recognize hydraulic excavators in construction images and videos, Automation in Construction 24: 194–202. https://doi.org/10.1016/j.autcon.2012.03.003

Rezazadeh Azar, E.; Dickinson, S.; McCabe, B. 2012. Server-customer interaction tracker: computer vision-based system to estimate dirt-loading cycles, Journal of Construction Engineering and Management 139(7): 785–794. https://doi.org/10.1061/(asce)co.1943-7862.0000652

Russell, B. C.; Torralba, A.; Murphy, K.; Freeman, W. T. 2008. LabelMe: A database and web-based tool for image annotation, International Journal of Computer Vision 77: 157–173. https://doi.org/10.1007/s11263-007-0090-8

Seo, J.; Han, S.; Lee, S.; Kim, H. 2015. Computer vision techniques for construction safety and health monitoring, Advanced Engineering Informatics 29(2): 239–251. https://doi.org/10.1016/j.aei.2015.02.001

Tang, M.; Gorelick, L.; Veksler, O.; Boykov, Y. 2013. Grabcut in one cut, in 14th IEEE International Conference on Computer Vision, 1769–1776. https://doi.org/10.1109/iccv.2013.222

Teizer, J. 2015. Status quo and open challenges in vision-based sensing and tracking of temporary resources on infrastructure construction sites, Advanced Engineering Informatics 29(2): 225–238. https://doi.org/10.1016/j.aei.2015.03.006

Ullah, M. M.; Parizi, S. N.; Laptev, I. 2010. Improving bag-of-features action recognition with non-local cues, in Proceedings of the British Machine Vision Conference, September 2010. BMVA Press, 95.1–95.11. https://doi.org/10.5244/c.24.95

Wang, H.; Klaser, A.; Schmid, C.; Liu, C. L. 2013. Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision 103(1): 60–79. https://doi.org/10.1007/s11263-012-0594-8

Yang, J.; Arif, O.; Vela, P. A.; Teizer, J.; Shi, Z. 2010. Tracking multiple workers on construction sites using video cameras, Advanced Engineering Informatics 24(4): 428–434. https://doi.org/10.1016/j.aei.2010.06.008

Yang, J.; Vela, P.; Teizer, J.; Shi, Z. 2014. Vision-based tower crane tracking for understanding construction activity, Journal of Computing in Civil Engineering 28(1): 103–112. https://doi.org/10.1061/41182(416)32

Yang, J.; Park, M. W.; Vela, P. A.; Golparvar-Fard, M. 2015. Construction performance monitoring via still images, timelapse photos, and video streams: Now, tomorrow, and the future, Advanced Engineering Informatics 29: 211–224. https://doi.org/10.1016/j.aei.2015.01.011

Yang, J.; Shi, Z.; Wu, Z. 2016. Vision-based action recognition of construction workers using dense trajectories, Advanced Engineering Informatics 30(3): 327–336. https://doi.org/10.1016/j.aei.2016.04.009

Yao, B.; Fei-Fei, L. 2010a. Grouplet: A structured image representation for recognizing human and object interactions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9–16. https://doi.org/10.1109/cvpr.2010.5540234

Yao, B.; Fei-Fei, L. 2010b. Modeling mutual context of object and human pose in human-object interaction activities, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 17–24. https://doi.org/10.1109/cvpr.2010.5540235

Ziaeefard, M.; Bergevin, R. 2015. Semantic human activity recognition: A literature review, Pattern Recognition 48(8): 2329–2345. https://doi.org/10.1016/j.patcog.2015.03.006

Zou, J.; Kim, H. 2007. Using hue, saturation, and value color space for hydraulic excavator idle time analysis, Journal of Computing in Civil Engineering 21(4): 238–246. https://doi.org/10.1061/(asce)0887-3801(2007)21:4(238)