Development of a data generator for multivariate numerical data with arbitrary correlations and distributions
Average rating
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Star rating
Your vote was cast
Thank you for your feedback
Thank you for your feedback
Issue Date
2021-01-01
Metadata
Show full item recordAbstract
Artificial or simulated data are particularly relevant in tests and benchmarks for machine learning methods, in teaching for exercises and for setting up analysis workflows. They are relevant when real data may not be used for reasons of data protection, or when special distributions or effects should be present in the data to test certain machine learning methods. In this paper a generator for multivariate numerical data with arbitrary marginal distributions and – as far as possible – arbitrary correlations is presented. The data generator is implemented in the open source statistics software R. It can also be used for categorical variables, if data are generated separately for the corresponding characteristics of a categorical variable. Additionally, outliers can be integrated. The use of the data generator is demonstrated with a concrete example.Citation
2021,Intelligent Data Analysis,25(4) pp.789-807; DOI:10.3233/IDA-205253.Affiliation
HZI,Helmholtz-Zentrum für Infektionsforschung GmbH, Inhoffenstr. 7,38124 Braunschweig, Germany.Publisher
IOS PressJournal
Intelligent Data AnalysisType
ArticleLanguage
enISSN
1088467XEISSN
15714128Sponsors
Bundesministerium für Wirtschaft und Energieae974a485f413a2113503eed53cd6c53
10.3233/IDA-205253
Scopus Count
The following license files are associated with this item:
- Creative Commons