1 code implementation • 3 May 2023 • Benjamin Shade, Eduardo G. Altmann
We also found numerically that the Jensen--Shannon divergence and embedding-based approaches were robust to changes in $h$, while the Jaccard distance was not.
1 code implementation • 30 Jun 2021 • Charles C. Hyland, Yuanming Tao, Lamiae Azizi, Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann
We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks.
1 code implementation • 25 Jun 2020 • Eduardo G. Altmann
Analyses of urban scaling laws assume that observations in different cities are independent of the existence of nearby cities.
Physics and Society
no code implementations • 22 Jun 2020 • Giampaolo Cristadoro, Mirko Degli Esposti, Eduardo G. Altmann
Genetic sequences are known to possess non-trivial composition together with symmetries in the frequencies of their components.
1 code implementation • 27 Apr 2020 • Hongjia H. Chen, Tristram J. Alexander, Diego F. M. Oliveira, Eduardo G. Altmann
In this paper we quantify the statistical properties and dynamics of the frequency of hashtag use on Twitter.
Physics and Society Social and Information Networks
1 code implementation • 4 Aug 2017 • Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann
By adapting existing community-detection methods -- using a stochastic block model (SBM) with non-parametric priors -- we obtain a more versatile and principled framework for topic modeling (e. g., it automatically detects the number of topics and hierarchically clusters both the words and documents).
no code implementations • 11 Nov 2016 • Eduardo G. Altmann, Laercio Dias, Martin Gerlach
This finding allows us to identify the contribution of specific words (and word frequencies) for the different generalized entropies and also to estimate the size of the databases needed to obtain a reliable estimation of the divergences.
no code implementations • 1 Oct 2015 • Martin Gerlach, Francesc Font-Clos, Eduardo G. Altmann
Quantifying the similarity between symbolic sequences is a traditional problem in Information Theory which requires comparing the frequencies of symbols in different sequences.
no code implementations • 11 Feb 2015 • Eduardo G. Altmann, Martin Gerlach
Zipf's law is just one out of many universal laws proposed to describe statistical regularities in language.
no code implementations • 17 Jun 2014 • Martin Gerlach, Eduardo G. Altmann
In this paper we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies.
no code implementations • 17 Jun 2014 • Fakhteh Ghanbarnejad, Martin Gerlach, Jose M. Miotto, Eduardo G. Altmann
Combining data analysis with simulations of simple models (e. g., the Bass dynamics on complex networks) we identify signatures of endogenous and exogenous factors in the S-curves of adoption.
no code implementations • 6 Dec 2012 • Martin Gerlach, Eduardo G. Altmann
We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes.