For any project-related inquiries, please write an e-mail to hello@euplex.org

When using the dataset, please cite

Hurka, S., Haag, M. and Kaplaner, C. (2022) Policy complexity in the European Union, 1993-today: introducing the EUPLEX dataset. Journal of European Public Policy. (see publications)

Dataset codebook

The EUPLEX dataset consists of EU legislative procedures. The variables therefore relate to the procedure. Procedures have a number of variables relating directly to them, such as their reference number or their name. Within a procedure, various events, such as the adoption of a proposal or a vote, can occur. These events may have documents attached to them, e.g. the 'adoption of proposal by Commission' event may contain the actual proposal document. Variable naming rules are used to differentiate between event- and document-related variables.

Data structure

Events

All event-related variables use the prefix e_. All events have a legal date assigned to them, that is stored in the e_legal_date variable.

Currently only the following events are included in the dataset:

  • proposal: adoption of proposal by submitting institution, usually the Commission (ADP_byCOM), start of the legislative procedure
  • final: adoption / publication date of final law (PUB_OJ, SIGN_byEP_CONSIL, ADP_FRM_byCONSIL in that order, depending on data availability)

Documents

Documents are always attached to an event. All document-related variables carry the doc_ prefix. To reduce data size, events of one type are matched with their corresponding document only. I.e., a row where event==proposal has a corresponding doc entry for proposal but none for final. The following documents are included in the dataset:

Variables

Variable Name Type Description
Procedure ID procedure_id String Procedure ID as used in EUR-Lex urls
Procedure reference procedure_reference String Complete procedure reference
Procedure notice CELLAR uri uri__cellar String CELLAR uri of the procedure notice
Legislative procedure type procedure_type String Type of legisaltive procedure used
Proposal adopted proposal_adopted Logical Was the proposal adopted? (information based on ‘DOSSIER_ADOPTED-PROPOSAL’ tag in procedure notice)
Proposal pending proposal_pending Logical Is the proposal still pending? (information based on ‘DOSSIER_PENDING-PROPOSAL’ tag in procedure notice)
EUROVOC domain(s) eurovoc_… Logical Set of logical indicators marking whether a procedure is tagged with a EUROVOC identifier belonging to the specific EUROVOC domain
Procedure title title String Title of the procedure
Events
Event Name/ID event String Event identifier using ‘nice’ names. ‘proposal’ corresponds to ‘ADP_byCOM’ events in procedure notice. ‘final’ corresponds to `PUB_OJ`, `SIGN_byEP_CONSIL`, `ADP_FRM_byCONSIL` (in that order, depending on data availability) events in procedure notice.
Legal Date e_legal_date String (Date) / Integer The date of an event as registered in the `EVENT_LEGAL_DATE` tag of the procedure notice (YYYY-MM-DD) / STATA: Number of days since 1960-01-01
Event-document CELEX uris e_doc_celexs String The CELEX uri(s) of the main document attached to an event (used to merging documents to events)
Multiple main documents e_multi_main_docs Logical Does the event have multiple main documents? if TRUE, the CELEX uri without a bracket or ‘R’ postfix is preferred for matching
Responsible institution corporate body e_resp_inst__corp_body String The ‘corporate body’ name of the responsible institution for a specific event. For proposal, this is usually the abbrevation for the responsible Commission DG.
Documents
Document Name/ID doc String Document identifier using ‘nice’ names usually corresponding to an ‘event‘ name.
Document CELEX uri doc_uri_celex String The CELEX uri of the document (used for matching events to documents)
Document uris doc_uris String (JSON) All document URIs of the document in JSON format
Legal instrument doc_leg_instr String Legal instrument of the text as noted in the ‘RESOURCE-TYPE’ identifier of a document notice
Instrument subtype doc_leg_instr_subtype String The subtype of legislative insturment of the text (Legislation, Recast, Codification) taken from the title of a text
Implementing act doc_leg_instr_implementing Logical Is the text an implementing act? Taken from the ‘RESOURCE-TYPE’ identifier of a document notice
Amending act doc_amending Logical Is the text an amending act? Based on document title (see online appendix for additional information)
‘and’ amending act doc_and_amending Logical Is the text an ‘and’ amending act? Based on document title (see online appendix for additional information)
Adapting act doc_adatping Logical Is the text an adapting act? Based on document title
Repealing act doc_repealing Logical Is the text a repealing act? Based on document title
‘and’ repealing act doc_and_repealing Logical Is the text a ‘and’ repealing act? Based on document title
Document title doc_title String Title of the text
Policy complexity
Structural size doc_struct_size Integer Number of structural elements in text
Number of articles doc_articles Integer Number of articles in the document
Average element depth doc_avg_depth Float Average element depth of a text (see main text for explanation)
Average article depth doc_avg_article_depth Float Average depth of an article in the text
Word entropy doc_word_entropy Float Word entropy
Word entropy (lemmatized) doc_word_entropy_l Float Word entropy using lemmatized unigram tokens
Lix score doc_lix Float Lix readability score
SMOG doc_smog Float SMOG index for the text. Texts with fewer than 30 sentences are measured as 0
Dale-Chall formula doc_dale_chall Float Dale-Chall formula score for the text
Coleman-Liau index doc_coleman_liau_index Float Coleman-Liau index for the text
FORCAST doc_forcast Float FORCAST formula index for the text
Flesch-Kincaid Readability Score doc_flesch_grade_level Float Flesch-Kincaid grade level for the text
Flesch-Kincaid Reading Ease doc_flesch_reading ease Float Flesch-Kincaid reading ease for the text
Internal references (interdependence) doc_ref_int_enacting Integer Number of internal references in the enacting text
Relative internal references (interdependence) doc_ref_int_enacting_rel Float doc_ref_int_enacting / doc_articles
External references (embeddedness) doc_ref_ext_enacting Integer Number of external references in the enacting text
Relative external references (embeddedness) doc_ref_ext_enacting_rel Float doc_ref_ext_enacting / doc_articles
Word count (w/o annex) doc_words_noannex Integer Word count for the text excluding the annex text (i.e. citations, recitals, enacting terms). Based on blank English language spaCy version 3.0.1 ‘Tokenizer’ component with some corrections for EU-specific legal identifiers
Complete complexity indicators doc_complete_complexity Logical Are all complexity indicators non-missing for the given documents available?
Technical details
Bad EUR-LEX formatting indicator doc__bad_formatting Logical Indicates whether the document text formatting as provided by EUR-Lex does not allow for a precise analysis
Bad EUR-LEX formatting reason doc__bad_formatting_reason String Reason for the bad formatting classification
Text source format doc_format String Format of the document
Text parsing failed doc__euplexcy_failed Logical Did the parsing fail?
EUR-Lex search eurlex_search Logical Indicates whether or not the procedure is included in the EUR-Lex search index. See this tweet for more information.