téarma.ie foclóir.ie teanglann.ie gaois.ie logainm.ie
Ar ais chuig leagan don scáileán Back to screen version
An Bunachar Náisiúnta Téarmaíochta don Ghaeilge
The National Terminology Database for Irish
Roghchlár Menu

Link between téarma.ie and the New Corpus for Ireland

What is the New Corpus for Ireland?

The New Corpus for Ireland is a large collection of texts in Irish with approximately 30 million words. It contains a wide range of texts including works of fiction, informative texts, news reports, official documents and much more. The corpus is designed to be used for linguistic research – for example, to find examples of words being used in context or to investigate word frequency.

The New Corpus for Ireland was developed by Foras na Gaeilge as part of the New English-Irish Dictionary project. In addition, the corpus is available to the public on a dedicated website, <corpas.focloir.ie>. The website was built by Lexical Computing Limited with the collaboration of Fiontar & Scoil na Gaeilge, DCU. The corpus website is interlinked with the téarma.ie terminology database so that usage examples for Irish terms can be easily searched in the corpus by clicking an icons next to the term. (This only applies to Irish terms. Irish terms that do not have an icon are not available in the corpus.)

Access to the New Corpus for Ireland

When you first click the icon, you will see a pop-up window asking for a name and password. This is because it is necessary to register before you can access the corpus. Text that is included in a corpus does not lose any of the legal protection offered by copyright law. Because of this, registration is necessary to access the New Corpus for Ireland. If you are not registered yet, cancel the pop-up window and you will get instructions on how to register. Registration involves filling in an online form and then waiting for an email confirmation from Foras na Gaeilge. The confirmation should normally arrive within one working day. You will be able to use the corpus after that.

If you are already registered, type your name and password in the pop-up window. This only needs to be done once; the website will not ask you for this again if you are already logged in.

How to use the corpus

When you have come to the corpus from téarma.ie, you will get a page with three kinds of results:

  1. Concordance: You will see a list of examples in which the word or term you searched for is used in a sentence. These examples were selected from the corpus automatically. The home page lists approximately ten examples and you can get more by clicking the ‘more’ link.
  2. Collocations: The box on the right-hand side gives the ten most frequent words that co-occur with the word you searched for. For example, if you search for doras (door), you will get words such as oscail (open), dúnta (closed), plab (slam) and others. Once again, you can see more of these words by clicking the ‘more’ link. This list of collocates has been extracted from the corpus automatically.
  3. Statistics: At the bottom of the page, you will see some statistical data about how the word you searched for is used, such as genre and dialect. For example, if you search for fata – one of the words for 'potato’ – you will see that this word is used almost exclusively in the Connacht dialect. Once again, these statistics have been extracted from the corpus automatically. You can see more statistics by clicking the ‘more’ link.

If you want, you can use the corpus independently of téarma.ie by going to <corpas.focloir.ie>, logging in and typing a word or term in the search box on the home page.

Advanced searches

If you want to perform more complicated searches on the corpus, you can use the options in the menu on the left-hand side. Here is a summary of the options available.

Concordance: This is where you can search for and list sentences from the corpus based on the words that occur within them. This search is more powerful than the one on the home page; for example, if you select ‘lemma’ in the drop-down box, you can search for all forms of a word: type fuinneog (window) and you will get sentences where any inflected or mutated form of the word occurs: fuinneoige, bhfuinneog and so on. You can also sort and filter the results in several ways.

Word List: This is where you can extract various word lists from the corpus, such as a list of the most frequently occurring words in Irish.

Word Sketch: This section gives you an opportunity to see which words are most frequently used along with the word you are looking for. The results are presented in several lists according to the grammatical relation that exists between the two words. For example, if you search for a verb, you will get one list of its direct objects, another list of its subjects, and so on. Remember that this information was extracted from the corpus automatically and so it may not always be accurate.

Thesaurus: This section allows you to type a word and get a list of other words that are similar to it with respect to their patterns of usage. For example, if you search for the adjective folláin (healthy, wholesome), you will get a list that includes sláintiúil (healthy), sábháilte (safe) and others. This is basically a list of words that seem like synonyms because they are used in similar ways. But again, remember that this information was extracted from the corpus automatically and the words you receive may not necessarily be synonyms.

Sketch-Diff: This is a tool for investigating the difference between two words, based on other words that occur with them. If you type two words which are close to each other in meaning, for example, leanbh (baby) and páiste (child), you will get information that may help you understand the difference between them: you will see that the words used mainly with leanbh (baby) include saolaigh (give birth) and baist (baptize) while the words used mainly with páiste (child) include múin (teach) and foghlaim (learn).

More information

The corpus website is based on a corpus query system called Sketch Engine created by Lexical Computing Limited. If you need more information on using all the features available on the website, a detailed guide is available in Sketch Engine’s own help section.