Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools

Biro, Attila; Tunde Janosi-Rancz, Katalin; Szilagyi, Laszlo; Ignacio Cuesta-Vargas, Antonio; Martin-Martin, Jaime; Miklos Szilagyi, Sandor

Publication:
Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools

dc.contributor.author	Biro, Attila
dc.contributor.author	Tunde Janosi-Rancz, Katalin
dc.contributor.author	Szilagyi, Laszlo
dc.contributor.author	Ignacio Cuesta-Vargas, Antonio
dc.contributor.author	Martin-Martin, Jaime
dc.contributor.author	Miklos Szilagyi, Sandor
dc.contributor.authoraffiliation	[Biro, Attila] George Emil Palade Univ Med Pharm Sci & Technol T, Dept Elect Engn & Informat Technol, Str Nicolae Iorga 1, Targu Mures 540088, Romania
dc.contributor.authoraffiliation	[Miklos Szilagyi, Sandor] George Emil Palade Univ Med Pharm Sci & Technol T, Dept Elect Engn & Informat Technol, Str Nicolae Iorga 1, Targu Mures 540088, Romania
dc.contributor.authoraffiliation	[Biro, Attila] Univ Malaga, Dept Physiotherapy, Malaga 29071, Spain
dc.contributor.authoraffiliation	[Ignacio Cuesta-Vargas, Antonio] Univ Malaga, Dept Physiotherapy, Malaga 29071, Spain
dc.contributor.authoraffiliation	[Biro, Attila] Biomed Res Inst Malaga IBIMA, Malaga 29590, Spain
dc.contributor.authoraffiliation	[Ignacio Cuesta-Vargas, Antonio] Biomed Res Inst Malaga IBIMA, Malaga 29590, Spain
dc.contributor.authoraffiliation	[Martin-Martin, Jaime] Biomed Res Inst Malaga IBIMA, Malaga 29590, Spain
dc.contributor.authoraffiliation	[Tunde Janosi-Rancz, Katalin] Sapientia Hungarian Univ Transylvania, Computat Intelligence Res Grp, Targu Mures 540485, Romania
dc.contributor.authoraffiliation	[Szilagyi, Laszlo] Sapientia Hungarian Univ Transylvania, Computat Intelligence Res Grp, Targu Mures 540485, Romania
dc.contributor.authoraffiliation	[Szilagyi, Laszlo] Obuda Univ, Physiol Controls Res Ctr, H-1034 Budapest, Hungary
dc.contributor.authoraffiliation	[Ignacio Cuesta-Vargas, Antonio] Queensland Univ Technol, Fac Hlth Sci, Sch Clin Sci, Brisbane, Qld 4000, Australia
dc.contributor.authoraffiliation	[Martin-Martin, Jaime] Univ Malaga, Fac Med, Dept Human Anat Legal Med & Hist Sci, Legal & Forens Med Area, Malaga 29071, Spain
dc.contributor.funder	ITware, Hungary
dc.contributor.funder	Sapientia Foundation-Institute for Scientific Research
dc.date.accessioned	2023-05-03T13:47:15Z
dc.date.available	2023-05-03T13:47:15Z
dc.date.issued	2022-06-01
dc.description.abstract	Real-time multilingual phrase detection from/during online video presentations-to support instant remote diagnostics-requires near real-time visual (textual) object detection and preprocessing for further analysis. Connecting remote specialists and sharing specific ideas is most effective using the native language. The main objective of this paper is to analyze and propose-through DEtection TRansformer (DETR) models, architectures, hyperparameters-recommendation, and specific procedures with simplified methods to achieve reasonable accuracy to support real-time textual object detection for further analysis. The development of real-time video conference translation based on artificial intelligence supported solutions has a relevant impact in the health sector, especially on clinical practice via better video consultation (VC) or remote diagnosis. The importance of this development was augmented by the COVID-19 pandemic. The challenge of this topic is connected to the variety of languages and dialects that the involved specialists speak and that usually needs human translator proxies which can be substituted by AI-enabled technological pipelines. The sensitivity of visual textual element localization is directly connected to complexity, quality, and the variety of collected training data sets. In this research, we investigated the DETR model with several variations. The research highlights the differences of the most prominent real-time object detectors: YOLO4, DETR, and Detectron2, and brings AI-based novelty to collaborative solutions combined with OCR. The performance of the procedures was evaluated through two research phases: a 248/512 (Phase1/Phase2) record train data set, with a 55/110 set of validated data instances for 7/10 application categories and 3/3 object categories, using the same object categories for annotation. The achieved score breaks the expected values in terms of visual text detection scope, giving high detection accuracy of textual data, the mean average precision ranging from 0.4 to 0.65.
dc.identifier.doi	10.3390/app12125977
dc.identifier.essn	2076-3417
dc.identifier.unpaywallURL	https://www.mdpi.com/2076-3417/12/12/5977/pdf?version=1655103978
dc.identifier.uri	http://hdl.handle.net/10668/20796
dc.identifier.wosID	816310800001
dc.issue.number	12
dc.journal.title	Applied sciences-basel
dc.journal.titleabbreviation	Appl. sci.-basel
dc.language.iso	en
dc.organization	Instituto de Investigación Biomédica de Málaga-IBIMA
dc.publisher	Mdpi
dc.rights	Attribution 4.0 International
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	object visual detection
dc.subject	DETR
dc.subject	multilingual OCR
dc.subject	real-time translation
dc.subject	remote diagnostics
dc.subject	YOLO4
dc.subject	Detectron2
dc.subject	realtime text detection
dc.subject	assessment
dc.title	Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools
dc.type	research article
dc.type.hasVersion	VoR
dc.volume.number	12
dc.wostype	Article
dspace.entity.type	Publication

Collections

Instituto de Investigación Biomédica de Málaga - Plataforma Bionand (IBIMA)

Publication: Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools

Files

Collections

Publication:
Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools