Archive for July 2008

Fête nationale française 14 Juillet

Qu’est-ce que c’est la Révolution française et le jour de la prise de la Bastille ?
1880 – qui marque pour la France la consécration du 14 Juillet comme fête nationale
14 juillet, Jour de Fête nationale
Google.fr 14 Juillet
Le site officiel du Sénat.fr http://www.14juillet.senat.fr/
Fête nationale française sur WikiPedia

Défilé du 14 Juillet 2007 – Polytechnique en cours 2008

DEFILE PARIS – DEFILE AERIEN HELICOS

Carla Bruni Videos Comme si de rien n’était

Dernier album de Carla Bruni-Sarkozy, Comme si de rien n’était gratuit sur http://www.carlabruni.com/
Carla Bruni – Quelqu’un m’a dit

Carla Bruni a su identifier la communauté web et est un bon promotteur des blogs gratuits afin d’exprimer ton propre avis.

Bashar Al ASSAD sur Carla Bruni 12/07/08

Carla Bruni Amoureuse


EXCLU ! Clip Carla Bruni :
EXCLU ! Clip Carla Bruni : “L’Amoureuse” – wideo

EXCLU ! Clip Carla Bruni : “L’Amoureuse” – wideo
<p>Vidéo par Kuntzel + Deygas
Photos et dessins © Florence Deygas</p><p>© 2008 Teorema sous licence exclusive naïve</p>

TFIDF + SEO read more

Good to read if you are in the SEO business:
Understanding Inverse Document Frequency (IDF) by Dr. E. Garcia

“IDF is simply neither a pure heuristic, nor the theoretical mystery
many have made it out to be. We have a pretty good idea why
it works as well as it does.” –Stephen E. Robertson

In 1972, the late Karen Sparck Jones (August 26, 1935 – April 4, 2007) published in Journal of Documentation the global term weighting scheme that was later known as Inverse Document Frequency (IDF) . Then where there are N documents In the collection, the weight of a term which occurs n times is f(N) – f(n) + 1.”

A Comparison of Document, Sentence, and Term Event Spaces:

The vector based information retrieval model identifies relevant documents by comparing query terms with terms from a document corpus.
The most common corpus weighting scheme is the term frequency (TF) x inverse document frequency (IDF), where TF is the number of times a term appears in a document, and IDF reflects the distribution of terms within the corpus (Salton
and Buckley, 1988). Ideally, the system should assign the highest weights to terms with the most discriminative power.

Findings:
As users continue to demand information systems that provide sub-document
retrieval, the need to model language at the subdocument level becomes increasingly important. The key findings from this study are: (1) The raw document frequencies are considerably different to the sentence and term frequencies. The lack of a direct
correlation between the document and sub-document raw spaces, in particular
around the areas of important terms, suggest that it would be difficult to identify
a linear transformation between the document to sub-document spaces. In
contrast, the raw term frequencies correlate well with the sentence frequencies.
(2) IDF, ISF and ITF are highly correlated; however, simply replacing IDF with the
ISF or ITF would result in a weighting scheme where the corpus weight dominated
the weights assigned to query and document terms.
(3) IDF was surprisingly stable with respect to random samples at 10% of the total
corpus. The average IDF values based on only a 20% random stratified sample
correlated almost perfectly to IDF values that considered frequencies in the entire
corpus. This finding suggests that systems in a dynamic environment, such as
the Web, need not update the global IDF values regularly (see (4)).
(4) IDF values based on different journal samples did not correlate well to the
global IDF. Further work is required to understand when frequencies should
consider alternative subsets of a corpus.
(5) The language used in abstracts appears to be systematically different from the
language used in the body of a full-text scientific document across all three language
models. This suggests that further work is required to understand how the corpus-weighting schemes that are well studied on abstracts will perform in a full-text setting.