How to extract knowledge of Qualitative Data from Big Textual Data

Volume 9, Issue 1, February 2024     |     PP. 18-53      |     PDF (431 K)    |     Pub. Date: November 16, 2021
DOI: 10.54647/computer52243    69 Downloads     1680 Views  


Jouis Christophe, Centre d’Analyse et de Mathématiques Sociales - CAMS
Orús-Lacort Mercedes, Online teachers at College Mathematics

In this article, we will analyze how to obtain pertinent Information in the form of Qualitative Data graphically represented from unstructured Big Textual data. Unstructured data refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner (80-90% of all information). Obviously, it is not useful to accumulate large amounts of information if we cannot find a particular piece of information. The current methods prove to be expensive and the results are too often inappropriate. The goal of the research described here is to present an approach for automating the detection and the extraction of meaning from unstructured data using its normalized part: Web of data & Linked Open data (LOD). On the other hand, in structured indexes, classification systems, thesauri, conceptual structures or semantic networks, relationships are too often vague. One possible approach to this problem consists in organizing the relationships in a typology based on logical properties. For instance, we typically use only the general relation “Is-a” (too vague). We propose an original method: Contextual Exploration. This is implemented in the EC3 software. EC3 does not need syntactic analysis, statistical analysis nor a "general" ontology. EC3 uses only small ontologies called "linguistic ontologies" which depend on the knowledge of the language.

Big Textual Data, Extraction of Pertinent Information, Data Mining for Information Retrieval, Big Data Analysis, Web Semantics, Computational linguistics, Ontologies, Text Mining, Web Mining, (Big, Linked, Smart) Data, Semantic relations, Contextual Exploration, Qualitative Research, Relationships, Hierarchies, Concept, Semantics, Logical Properties, EC3.

Cite this paper
Jouis Christophe, Orús-Lacort Mercedes, How to extract knowledge of Qualitative Data from Big Textual Data , SCIREA Journal of Computer. Volume 9, Issue 1, February 2024 | PP. 18-53. 10.54647/computer52243


[ 1 ] A. Das & A., Indexing the World Wide Web: The Journey So Far, In Next Generation Search Engine, Advanced Models for Information Retrieval, pp. 1-28, C. Jouis, I. Biskri, J.-G. Ganascia, M. Roux (Eds): IGI Global, PA, USA (2012).
[ 2 ] C. Jouis, Contextual Exploration (EC3): A strategy for the detection, extraction and visualization of target data, 4th International Conference on Big Data Analysis and Data Mining,, September 07-08, 2017, Paris, France, DOI: 10.4172/2324-9307-C1-014, Paris, France (2017).
[ 3 ] Desclés, J.-P : Système d’Exploration Contextuelle, In C. Guimier (Ed.), In Cotexte et calcul du sens (pp. 215-232), Caen, France, Presses Universitaires de Caen, France (1997).
[ 4 ] Jouis, C. : Contributions à la conceptualisation et à la Modélisation des connaissances à partir d'une analyse linguistique de textes : réalisation d'un prototype : le système SEEK, PhD. Thesis, Paris, Under the direction of J.P. Desclés, EHESS & Centre d'Analyse et de Mathématiques sociales (Paris), en convention CIFRE: EDIAT/CR2A/IBM.
[ 5 ] Alrahabi, M.: Plateforme d’annotation automatique de catégories sémantiques: conception, modélisation et réalisation informatique: applications à la catégorisation des citations en arabe et en français, 2010, Under the direction of Jean-Pierre Desclés, Paris, Université Paris-Sorbonne -Paris IV, France (1993).
[ 6 ] Makkaoui O.: PhD. Thesis, Construction de fiches de synthèse par annotation sémantique automatique des publications scientifiques: application aux articles en biologie, Under the direction of Jean-Pierre Desclés et Christophe Jouis, Paris, Université Paris-Sorbonne -Paris IV, France (2014).
[ 7 ] Djioua B., Desclés, J.-P., Alrahabi, M., Searching and Mining with Semantic Categories, pp. 115-137, In Next Generation Search Engine, Advanced Models for Information Retrieval, pp. 1-28, C. Jouis, I. Biskri, J.-G. Ganascia, M. Roux (Eds): IGI Global, PA, USA (2012).
[ 8 ] Fadili, H., Jouis, C.: towards an automatic analyze and standardization of unstructured data in the context of big and linked data, Proceedings of the 8th ACM International Conference on Management of Digital Ecosystems, November 2016, Henday, France (2016).
[ 9 ] Fadili, H., Jouis, C. : Exploration Contextuelle (EC3) : une stratégie de détection, d’extraction et de visualisation des données cibles, Séminaire TIM 2017 (DGA, Ecole Militaire), Paris, France (2017/07/5).
[ 10 ] Jouis C., EC3 project, on Researchgate,
[ 11 ] Agrawal, R., & Dhar, V. Editorial – Big data, Data science, and analytics: the opportunity and challenge is research. Information System Research, 25(3), 443-448, (2014).
[ 12 ] Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S., and Zhou, X. Big data challenge: a data management perspective. Frontiers of computer Science, 275, 314-347, (2013).
[ 13 ] Gandomi, A., & Haider, M. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144, (2015).
[ 14 ] VanDijck, J. Datafication, dataism and dataveillance: Big data between scientific paradigm and ideology. Surveillance & Society, 12(2), 197-208, (2014).
[ 15 ] Liu, Y & Jin, H. Building a network highway for big data: architecture and challenge. IEEE Network, 28(4), 5-13.FNM Surname (2018). Article Title. Journal Title, 10(3), 1–10, (2014).
[ 16 ] Augenstein, I.: Lodifier: Generating Linked Data from Unstructured Tex". ESWC (2012).
[ 17 ] Curran J. R., Clark S., and Bos J.: Linguistically Motivated Large-Scale NLP with C&C and Boxer. Proceedings of the ACL 2007 Demonstrations Session (ACL-07 demo), pp.33-36, (2007).
[ 18 ] Kamp H.: A Theory of Truth and Semantic Representation. In P. Portner & B. H. Partee (eds.), Formal Semantics - the Essential Readings. Blackwell. 189-222, (1981).
[ 19 ] Tao C., Song. Sharma, & Chute, G., Semantator: Semantic annotator for converting biomedical text to linked data. Journal of Biomedical Informatics, Volume 46, Issue 5, Pages 882-893. DOI: 10.1016/j.jbi.2013.07.003, (2013)
[ 20 ] Rusu D., Fortuna B., M., Dunja: Automatically Annotating Text with Linked Open Data. Venue: In 4th Linked Data on the Web Workshop (LDOW 2011), 20th World Wide Web Conference, (2011)
[ 21 ] Gupta A., Viswanathan K., Joshi, Finin, T. & Kumaraguru, P.: Integrating Linked Open Data with Unstructured Text for Intelligence Gathering Tasks. Proceedings of the Eighth International Workshop on Information Integration on the Web, 28/03/2011.
[ 22 ] Chan, J. O. "An Architecture for Big Data Analytics.", Communications of the IIMA 13.2: 1-13. ProQuest Central. Web. 6 May 2014, (2013).
[ 23 ] Boury-Brisset, A.-C. Managing Semantic Big Data for Intelligence., in Kathryn Blackmond Laskey; Ian Emmons & Paulo Cesar G. da Costa, ed., 'STIDS’,, pp. 41-47, (2013).
[ 24 ] Dimitrov, M.: From Big Data to Smart Data. Semantic Days, (2013).
[ 25 ] Khan, E.: “Addressing Big Data Problems using Semantics and Natural Language Under-standing,” 12th International Conference on Telecommunications and Informatics (Tele-Info ‘13), Baltimore, September 17-19, (2013).
[ 26 ] Khan, E.: "Processing Big Data with Natural Semantics and Natural Language Understanding using Brain-Like Approach”, (2014).
[ 27 ] Fadili., H.: Towards a new approach of an automatic and contextual detection of meaning in text, Based on lexicosemantic relations and the concept of the context., IEEE-AICCSA, (2013).
[ 28 ] Jouis, C. "Contextual Approach: SEEK, a linguistic and computational tool for use in knowledge acquisition", in Proceeding of the First European Conference "Cognitive Science in Industry", 28th -30th September 1994, Luxembourg, pp. 259-274, Luxembourg (1994)
[ 29 ] Desclés, J.-P. Contextual exploration processing for discourse and automatic annotations of texts. In FLAIRS Conference, 281–284, Florida, USA (2006).
[ 30 ] Sowa, J.-F., Conceptual structures, Information Processing in mind and machine, Addison-Wesley, (1984).
[ 31 ] Alrahabi, M. EXCOM-2 : plate-forme d’annotation automatique de catégories sémantiques : Applications à la catégorisation des citations en français et en arabe. PhD. Dissertation, Université Paris-Sorbonne, France (2010).
[ 32 ] Atanassova, I. 2012. Exploitation informatique des annotations sémantiques automatiques d’Excom pour la recherche d’informations et la navigation. PhD. Dissertation, Université Paris-Sorbonne, France (2012).
[ 33 ] Bertin, M. Biblio sémantique : une technique linguistique et informatique par exploration contextuelle. PhD. Dissertation, Université Paris-Sorbonne, France (2011).
[ 34 ] Djioua, B. ; Flores, J. J. G. ; Blais, A. ; Desclés, J.-P. ; Guibert, G. ; Jackiewicz, A. ; Le Priol, F. ; Nait-Baha, L. ; and Sauzay, B. Excom: An automatic annotation engine for semantic information. In FLAIRS Conference, 285–290, (2006).
[ 35 ] Desclés, J.; Alrahabi, M.; and Desclés, J.-P. BioExcom: Detection and categorization of speculative sentences in biomedical literature. In Human Language Technology. Challenges for Computer Science and Linguistics. Springer. 478–489, (2011).
[ 36 ] Makkaoui O. : Construction de fiches de synthèse par annotation sémantique automatique des publications scientifiques : Application aux articles en biologie, PhD. Dissertation, Université Paris-Sorbonne, France (2014).
[ 37 ] Desclés, J.P. & Faiz R. Méthode automatique d'annotations sémantiques et indexation de documents textuels pour l'extraction d'objets pédagogiques. Boutheina Ben Ali, France (2014).
[ 38 ] Descles J.-P. : Langages applicatifs, langues naturelles et cognition, Hermès, Paris, France (1990).
[ 39 ] Jouis C. & Shafei B.: Big textual data: how to find relevant information (with low cost)? Invited Paper. In Proceedings of the 10th International Conference on Management of Emergent Digital Ecosystems (MEDES'18). ACM, Tokyo, Japan (2018).
[ 40 ] Heymann S., GEPHI, Encyclopedia of Social Network Analysis and Mining, pp.612-625, (2014).
[ 41 ] Jouis, C, (1995). «SEEK, un logiciel d'acquisition des connaissances utilisant un savoir linguistique sans employer de connaissances sur le monde externe». In Actes des 6ème Journées Acquisition, Validation, (JAVA 95), INRIA, pp. 159--172, Grenoble, France (1995).
[ 42 ] Mustafa, W. & Jouis, C. “Terminology Extraction and acquisition from textual data: criteria for evaluating tools and methods” In Proceedings of the First International Conference on Language Resources and Evaluation, Granada (Spain): 28- 30 May 1998, organized by ELRA (European Language Resources Association). Granada : ELRA, Vol. 2, pp. 1175-1180, Spain (1998).
[ 43 ] Descles, J.-P., Langages applicatifs, langues naturelles et cognition, Hermès, (1990).
[ 44 ] Descles, J.-P. & Guibert, G. La fonction première du langage, Champion, Paris, France (2011).
[ 45 ] Mustafa, W., Jouis C. "Natural Language Processing-based Techniques and their Use in Data Modelling and Information Retrieval", In Proceedings of 6th International Study Conference on Classification Research, Knowledge Organization for Information Retrieval, 16-19 June 1997, University College of London, London, FID/CR, & ISKO. The Hague: FID, pp. 157-161, UK (1997).
[ 46 ] Mustafa, W. & Jouis, C. "Natural Language Processing-based Systems for Terminological Construction and their Contribution to Information Retrieval", in Proceedings of the Fourth International Congress on Terminology and Knowledge Engineering (TKE'96), Vienna, Austria, INDEX Verlag, Frankfurt/Main. 118- 130, Austria (1996).
[ 47 ] Jouis, C. & Ferru, J.-M. (2004). « Intranet Try To Find Project (ITTF): An approach for the searching of relevant information inside an organization », LREC 2004: Language Resources and Technology Evaluation within Human Language Technologies, pp. 1325-1329, ELRA – European Language Resources Association, Lisbon, Portugal (2004).
[ 48 ] Grize, J.-B. Logique Moderne. (Fascicule II). Paris : Mouton/Gauthier-Villars, France (1973).