Diesner J, Carley KM (2010) Mapping socio-technical networks of Sudan from open-source, large-scale text data. 29th Annual Conference of the Sudan Studies Association (SSA), West Lafayette, IN, May 2010.
Data on socio-cultural networks enable the analysis of the properties and dynamics of complex, real-world systems. The collection of network data through surveys is prohibitively expensive when a large number of individuals needs to be considered, and is impossible when the system’s entities are inaccessible to researchers. In such cases, extracting network data from text corpora can provide an alternative data collection method. We have collected a corpus of about 45,000 publically available text documents about Sudan from a variety of sources, such as news agencies and reports written by subject-matter experts. We demonstrate a computer-supported methodology for distilling socio-cultural networks from these data and present our results from performing network analysis on the resulting data. For example, we show and analyze the network of tribal connections in various regions of Sudan. The validation of data and results on inaccessible large-scale networks is difficult to impossible. We report on our approach for getting the data validated by a renowned subject-matter expert on Sudan and making respective changes to the network data and extraction method. We also compare the extracted data to data provided by the subject-matter expert and highlight the main commonalities and differences. The extraction, management, and analysis of the network data were performed by using freely available software products from the CASOS center at Carnegie Mellon.