Publication:
Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools

dc.contributor.authorBiro, Attila
dc.contributor.authorTunde Janosi-Rancz, Katalin
dc.contributor.authorSzilagyi, Laszlo
dc.contributor.authorIgnacio Cuesta-Vargas, Antonio
dc.contributor.authorMartin-Martin, Jaime
dc.contributor.authorMiklos Szilagyi, Sandor
dc.contributor.authoraffiliation[Biro, Attila] George Emil Palade Univ Med Pharm Sci & Technol T, Dept Elect Engn & Informat Technol, Str Nicolae Iorga 1, Targu Mures 540088, Romania
dc.contributor.authoraffiliation[Miklos Szilagyi, Sandor] George Emil Palade Univ Med Pharm Sci & Technol T, Dept Elect Engn & Informat Technol, Str Nicolae Iorga 1, Targu Mures 540088, Romania
dc.contributor.authoraffiliation[Biro, Attila] Univ Malaga, Dept Physiotherapy, Malaga 29071, Spain
dc.contributor.authoraffiliation[Ignacio Cuesta-Vargas, Antonio] Univ Malaga, Dept Physiotherapy, Malaga 29071, Spain
dc.contributor.authoraffiliation[Biro, Attila] Biomed Res Inst Malaga IBIMA, Malaga 29590, Spain
dc.contributor.authoraffiliation[Ignacio Cuesta-Vargas, Antonio] Biomed Res Inst Malaga IBIMA, Malaga 29590, Spain
dc.contributor.authoraffiliation[Martin-Martin, Jaime] Biomed Res Inst Malaga IBIMA, Malaga 29590, Spain
dc.contributor.authoraffiliation[Tunde Janosi-Rancz, Katalin] Sapientia Hungarian Univ Transylvania, Computat Intelligence Res Grp, Targu Mures 540485, Romania
dc.contributor.authoraffiliation[Szilagyi, Laszlo] Sapientia Hungarian Univ Transylvania, Computat Intelligence Res Grp, Targu Mures 540485, Romania
dc.contributor.authoraffiliation[Szilagyi, Laszlo] Obuda Univ, Physiol Controls Res Ctr, H-1034 Budapest, Hungary
dc.contributor.authoraffiliation[Ignacio Cuesta-Vargas, Antonio] Queensland Univ Technol, Fac Hlth Sci, Sch Clin Sci, Brisbane, Qld 4000, Australia
dc.contributor.authoraffiliation[Martin-Martin, Jaime] Univ Malaga, Fac Med, Dept Human Anat Legal Med & Hist Sci, Legal & Forens Med Area, Malaga 29071, Spain
dc.contributor.funderITware, Hungary
dc.contributor.funderSapientia Foundation-Institute for Scientific Research
dc.date.accessioned2023-05-03T13:47:15Z
dc.date.available2023-05-03T13:47:15Z
dc.date.issued2022-06-01
dc.description.abstractReal-time multilingual phrase detection from/during online video presentations-to support instant remote diagnostics-requires near real-time visual (textual) object detection and preprocessing for further analysis. Connecting remote specialists and sharing specific ideas is most effective using the native language. The main objective of this paper is to analyze and propose-through DEtection TRansformer (DETR) models, architectures, hyperparameters-recommendation, and specific procedures with simplified methods to achieve reasonable accuracy to support real-time textual object detection for further analysis. The development of real-time video conference translation based on artificial intelligence supported solutions has a relevant impact in the health sector, especially on clinical practice via better video consultation (VC) or remote diagnosis. The importance of this development was augmented by the COVID-19 pandemic. The challenge of this topic is connected to the variety of languages and dialects that the involved specialists speak and that usually needs human translator proxies which can be substituted by AI-enabled technological pipelines. The sensitivity of visual textual element localization is directly connected to complexity, quality, and the variety of collected training data sets. In this research, we investigated the DETR model with several variations. The research highlights the differences of the most prominent real-time object detectors: YOLO4, DETR, and Detectron2, and brings AI-based novelty to collaborative solutions combined with OCR. The performance of the procedures was evaluated through two research phases: a 248/512 (Phase1/Phase2) record train data set, with a 55/110 set of validated data instances for 7/10 application categories and 3/3 object categories, using the same object categories for annotation. The achieved score breaks the expected values in terms of visual text detection scope, giving high detection accuracy of textual data, the mean average precision ranging from 0.4 to 0.65.
dc.identifier.doi10.3390/app12125977
dc.identifier.essn2076-3417
dc.identifier.unpaywallURLhttps://www.mdpi.com/2076-3417/12/12/5977/pdf?version=1655103978
dc.identifier.urihttp://hdl.handle.net/10668/20796
dc.identifier.wosID816310800001
dc.issue.number12
dc.journal.titleApplied sciences-basel
dc.journal.titleabbreviationAppl. sci.-basel
dc.language.isoen
dc.organizationInstituto de Investigación Biomédica de Málaga-IBIMA
dc.publisherMdpi
dc.rightsAttribution 4.0 International
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectobject visual detection
dc.subjectDETR
dc.subjectmultilingual OCR
dc.subjectreal-time translation
dc.subjectremote diagnostics
dc.subjectYOLO4
dc.subjectDetectron2
dc.subjectrealtime text detection
dc.subjectassessment
dc.titleVisual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools
dc.typeresearch article
dc.type.hasVersionVoR
dc.volume.number12
dc.wostypeArticle
dspace.entity.typePublication

Files