For any project-related inquiries, please write an e-mail to hello@euplex.org
When using the dataset, please cite
Hurka, S., Haag, M. and Kaplaner, C. (2022) Policy complexity in the European Union, 1993-today: introducing the EUPLEX dataset. Journal of European Public Policy. (see publications)
The EUPLEX dataset consists of EU legislative procedures. The variables therefore relate to the procedure. Procedures have a number of variables relating directly to them, such as their reference number or their name. Within a procedure, various events, such as the adoption of a proposal or a vote, can occur. These events may have documents attached to them, e.g. the 'adoption of proposal by Commission' event may contain the actual proposal document. Variable naming rules are used to differentiate between event- and document-related variables.
All event-related variables use the prefix e_
. All events have a legal date assigned to them, that is stored in the e_legal_date
variable.
Currently only the following events are included in the dataset:
proposal
: adoption of proposal by submitting institution, usually the Commission (ADP_byCOM
), start of the legislative procedurefinal
: adoption / publication date of final law (PUB_OJ
, SIGN_byEP_CONSIL
, ADP_FRM_byCONSIL
in that order, depending on data availability)Documents are always attached to an event. All document-related variables carry the doc_
prefix. To reduce data size, event
s of one type are matched with their corresponding document
only. I.e., a row where event==proposal
has a corresponding doc
entry for proposal
but none for final
. The following documents are included in the dataset:
Variable | Name | Type | Description |
---|---|---|---|
Procedure ID | procedure_id | String | Procedure ID as used in EUR-Lex urls |
Procedure reference | procedure_reference | String | Complete procedure reference |
Procedure notice CELLAR uri | uri__cellar | String | CELLAR uri of the procedure notice |
Legislative procedure type | procedure_type | String | Type of legisaltive procedure used |
Proposal adopted | proposal_adopted | Logical | Was the proposal adopted? (information based on ‘DOSSIER_ADOPTED-PROPOSAL’ tag in procedure notice) |
Proposal pending | proposal_pending | Logical | Is the proposal still pending? (information based on ‘DOSSIER_PENDING-PROPOSAL’ tag in procedure notice) |
EUROVOC domain(s) | eurovoc_… | Logical | Set of logical indicators marking whether a procedure is tagged with a EUROVOC identifier belonging to the specific EUROVOC domain |
Procedure title | title | String | Title of the procedure |
Events | |||
Event Name/ID | event | String | Event identifier using ‘nice’ names. ‘proposal’ corresponds to ‘ADP_byCOM’ events in procedure notice. ‘final’ corresponds to `PUB_OJ`, `SIGN_byEP_CONSIL`, `ADP_FRM_byCONSIL` (in that order, depending on data availability) events in procedure notice. |
Legal Date | e_legal_date | String (Date) / Integer | The date of an event as registered in the `EVENT_LEGAL_DATE` tag of the procedure notice (YYYY-MM-DD) / STATA: Number of days since 1960-01-01 |
Event-document CELEX uris | e_doc_celexs | String | The CELEX uri(s) of the main document attached to an event (used to merging documents to events) |
Multiple main documents | e_multi_main_docs | Logical | Does the event have multiple main documents? if TRUE, the CELEX uri without a bracket or ‘R’ postfix is preferred for matching |
Responsible institution corporate body | e_resp_inst__corp_body | String | The ‘corporate body’ name of the responsible institution for a specific event. For proposal, this is usually the abbrevation for the responsible Commission DG. |
Documents | |||
Document Name/ID | doc | String | Document identifier using ‘nice’ names usually corresponding to an ‘event‘ name. |
Document CELEX uri | doc_uri_celex | String | The CELEX uri of the document (used for matching events to documents) |
Document uris | doc_uris | String (JSON) | All document URIs of the document in JSON format |
Legal instrument | doc_leg_instr | String | Legal instrument of the text as noted in the ‘RESOURCE-TYPE’ identifier of a document notice |
Instrument subtype | doc_leg_instr_subtype | String | The subtype of legislative insturment of the text (Legislation, Recast, Codification) taken from the title of a text |
Implementing act | doc_leg_instr_implementing | Logical | Is the text an implementing act? Taken from the ‘RESOURCE-TYPE’ identifier of a document notice |
Amending act | doc_amending | Logical | Is the text an amending act? Based on document title (see online appendix for additional information) |
‘and’ amending act | doc_and_amending | Logical | Is the text an ‘and’ amending act? Based on document title (see online appendix for additional information) |
Adapting act | doc_adatping | Logical | Is the text an adapting act? Based on document title |
Repealing act | doc_repealing | Logical | Is the text a repealing act? Based on document title |
‘and’ repealing act | doc_and_repealing | Logical | Is the text a ‘and’ repealing act? Based on document title |
Document title | doc_title | String | Title of the text |
Policy complexity | |||
Structural size | doc_struct_size | Integer | Number of structural elements in text |
Number of articles | doc_articles | Integer | Number of articles in the document |
Average element depth | doc_avg_depth | Float | Average element depth of a text (see main text for explanation) |
Average article depth | doc_avg_article_depth | Float | Average depth of an article in the text |
Word entropy | doc_word_entropy | Float | Word entropy |
Word entropy (lemmatized) | doc_word_entropy_l | Float | Word entropy using lemmatized unigram tokens |
Lix score | doc_lix | Float | Lix readability score |
SMOG | doc_smog | Float | SMOG index for the text. Texts with fewer than 30 sentences are measured as 0 |
Dale-Chall formula | doc_dale_chall | Float | Dale-Chall formula score for the text |
Coleman-Liau index | doc_coleman_liau_index | Float | Coleman-Liau index for the text |
FORCAST | doc_forcast | Float | FORCAST formula index for the text |
Flesch-Kincaid Readability Score | doc_flesch_grade_level | Float | Flesch-Kincaid grade level for the text |
Flesch-Kincaid Reading Ease | doc_flesch_reading ease | Float | Flesch-Kincaid reading ease for the text |
Internal references (interdependence) | doc_ref_int_enacting | Integer | Number of internal references in the enacting text |
Relative internal references (interdependence) | doc_ref_int_enacting_rel | Float | doc_ref_int_enacting / doc_articles |
External references (embeddedness) | doc_ref_ext_enacting | Integer | Number of external references in the enacting text |
Relative external references (embeddedness) | doc_ref_ext_enacting_rel | Float | doc_ref_ext_enacting / doc_articles |
Word count (w/o annex) | doc_words_noannex | Integer | Word count for the text excluding the annex text (i.e. citations, recitals, enacting terms). Based on blank English language spaCy version 3.0.1 ‘Tokenizer’ component with some corrections for EU-specific legal identifiers |
Complete complexity indicators | doc_complete_complexity | Logical | Are all complexity indicators non-missing for the given documents available? |
Technical details | |||
Bad EUR-LEX formatting indicator | doc__bad_formatting | Logical | Indicates whether the document text formatting as provided by EUR-Lex does not allow for a precise analysis |
Bad EUR-LEX formatting reason | doc__bad_formatting_reason | String | Reason for the bad formatting classification |
Text source format | doc_format | String | Format of the document |
Text parsing failed | doc__euplexcy_failed | Logical | Did the parsing fail? |
EUR-Lex search | eurlex_search | Logical | Indicates whether or not the procedure is included in the EUR-Lex search index. See this tweet for more information. |