When will the next (3.8 or later) set of DBPedia dumps be available? - dbpedia

Is there an announced/expected timeframe when the next (after 3.7) set of DBPedia extraction dumps will be available? (Does the project have a target interval?)

Our target release date is late March. Expect an announcement on the dbpedia-discussion mailing list in the next few days. (I'm currently working for the DBpedia project at Freie Universität Berlin.)
EDIT: Some things take longer than expected... target release date moved from late March to late April. Sorry for the delay. By the way, yes, we have a target interval, and it's six months.
Join the DBpedia Mapping Sprin(g|t) 2012! The new and improved mappings will be used for the upcoming DBpedia 3.8 release.
EDIT 2: Took longer than expected, offers a lot more data than DBpedia 3.7 - DBpedia 3.8 is here.

Related

Is there a way to reduce the size of the Spacy installation?

I'm using Spacy in a project and noticed that my Docker images are pretty big. A bit of research led me to find out that just the Spacy installation itself (in /usr/local/lib/python3.6/site-packages/spacy) accounts for 267MB, so I was wondering if there's anything that can be done to reduce that footprint?
Out of interest, SpaCy's 2.2 was released yesterday (Oct 2nd, 2019).
One of the product features for this 2.2 is "Smaller disk foot-print, better language resource handling". So, upgrading to SpaCy 2.2. may be one way to reduce the size of a SpaCy installation.
(Although this post doesn't solve your specific problem, I believe it does answer this specific question.)

Linux kernel "historical" git repository with full history

I think many developers like to investigate sources with the help of git gui blame. As explained in the commit for Linux-2.6.12-rc2 (also mirrored at Github), it needs to have special historical Linux repository for this purpose.
Linux-2.6.12-rc2
Initial git repository build. I’m not bothering with the full history,
even though we have it. We can create a separate “historical” git
archive of that later if we want to, and in the meantime it’s about
3.2GB when imported into git — space that would just make the early
git days unnecessarily complicated, when we don’t have a lot of good
infrastructure for it.
Let it rip!
I have looked at a lot of the prepared historical repositories but I didn’t find one containing changes going back to version zero, so I gave up and am asking this question here.
Here is my setup.
I have a repository with a clone of the following remotes:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
https://git.kernel.org/pub/scm/linux/kernel/git/davej/history.git
And the following grafts (info/grafts):
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 e7e173af42dbf37b1d946f9ee00219cb3b2bea6a
7a2deb32924142696b8174cdf9b38cd72a11fc96 379a6be1eedb84ae0d476afbc4b4070383681178
With these grafts, I have an unbroken view of the kernel history since 0.01. The first graft glues together the very first release in Linus' repository with the corresponding release of tglx/history.git. The second graft glues together tglx/history.git and davej/history.git.
There are a few older versions missing, and the older versions have release granularity instead of patch granularity, but this is the best setup I know of.
Edit: Dave Jones pointed me to http://www.archive.org/details/git-history-of-linux, which seems to be exactly what you want.
Here is a review of available 2018 options with a focus on tag availability and date correctness.
https://archive.org/download/git-history-of-linux/full-history-linux.git.tar
Developed by Dave Jones, and made available on archive.org.
Covers early versions to 2010.
244,464 commits
Just 184 tags, covering versions in 2.6. The tags that should have been created for all versions seem to be missing.
Early commits have realistic dates, but incorrect times (11:00:00 199X -0600).
Some dates seem to be incorrect. For example, both 2.1.110 and 2.1.111 are dated Wed May 20 11:00:00 1998 -0600, although the latest file in the 2.1.111 snapshot is dated 1998-07-25 09:17.
The creation process is documented on GitHub and seems very thorough.
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/
Created by Thomas Gleixner.
Covers 2.4.0 to 2.6.12-rc2.
Contains 170 tags covering 2.5.X and 2.6.X.
63,428 commits
Dates are correct.
Contains patches converted into commits.
https://github.com/mpe/linux-fullhistory
Created by Michael Ellerman, derived from work by Yoann Padioleau, based on historical trees reconstructed by Dave Jones and Thomas Gleixner, and Linus' mainline tree.
Covers full history
Provides only 558 tags, mostly starting at 2.0.0.
790,471 commits
Same issues with dates as in Dave Jones's repo.
Uses replace objects instead of grafts.
https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/
Owned by the Linux history team.
Covers early versions to 2.6.33-rc5.
1710 tags, starting with 0.10, covering most early versions.
244,774 commits
Most historic versions are incorrectly dated Fri Nov 23 15:09:04 2007 -0500.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
Modern Linux development.
Covers 2.6.12-rc2 (2005) until today
569 tags
777,419 commits (August 2018)
Proper commits
the referenced repos no longer exist. the new one is here:
https://git.kernel.org/cgit/linux/kernel/git/history/history.git/
if you're like me and want to keep some repos sep, you can leverage alternates with the graft to do so:
# Same dir as main linux
$ git clone --bare git://git.kernel.org/pub/scm/linux/kernel/git/history/history.git
$ cd linux/.git/
$ echo ../../../history.git/objects >> objects/info/alternates
$ echo 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 e7e173af42dbf37b1d946f9ee00219cb3b2bea6a >> info/grafts
$ echo 7a2deb32924142696b8174cdf9b38cd72a11fc96 379a6be1eedb84ae0d476afbc4b4070383681178 >> info/grafts
The best what I've found is git://git.kernel.org/pub/scm/linux/kernel/git/davej/history.git. History tracking there starts from Linux-0.01 but many comments are poor something like "Import 2.1.38pre1".
Anyway there is a lot of knowledge.
Thanks for help!

SubSonic 2.2 Not Fully Update For VB?

An answer to a recent question on extending SubSonic pointed to a googlecode directory listing the relevant ActiveRecord templates. I was very surprised to see that most of the CS_ ones are a lot newer than the VB_ equivalents. Does this mean that SS2.2 has not been fully updated for VB?
See this thread
For the reasons why vb.net isn't fully "baked" for 2.2
I committed a patch to the templates and made sure it was included in the VB versions. There weren't many template changes for 2.2, so it should still be fine. Is there something not working for you?
I was kind of surprised that the date stamps had changed on most of the CS_ ones, and not the VB_ ones.
In my case, I'm not having much luck with Winforms data binding, although I not a recent item on the GoogleCode buglist which may be applicable. I think that one is post-2.2?

Really, though, how can gmail still be "beta"?

It's been out for almost five years.
It's got tens of millions of users
I suspect several businesses rely on it.
How is it still "beta"? At what point will it no longer be beta? When it completely owns the e-mail market?
According to a Google spokesman:
"We have very high internal metrics
our consumer products have to meet
before coming out of beta. Our teams
continue to work to improve these
products and provide users with an
even better experience. We believe
beta has a different meaning when
applied to applications on the Web,
where people expect continual
improvements in a product. On the
Web, you don't have to wait for the
next version to be on the shelf or an
update to become available.
Improvements are rolled out as they're
developed. Rather than the packaged,
stagnant software of decades past,
we're moving to a world of regular
updates and constant feature
refinement where applications live in
the cloud."
Wikipedia defines Beta Version as:
A 'beta version' is the first version released outside the organization or community that develops the software, for the purpose of evaluation or real-world black/grey-box testing. The process of delivering a beta version to the users is called beta release. Beta level software generally includes all features, but may also include known issues and bugs of a less serious variety.
So this confirms that Google's use of the word is non-standard. I found this Slashdot article, Has Google Redefined Beta?, to be pretty interesting.
I think Google borrowed the word for their own ends and it shouldn't be taken at face value with the traditional definition of "Beta". It simply looks better to put "Beta" by your apps name instead of, "We are still constantly adding features to this product".
Well it was down for 30 hours about two months ago. Looks like even after five years there are a few kinks to iron out.
Google itself was in beta for years. The founders have much higher standards for their products than other companies.
Just like C++ wasn't a standard for quite a while :)
Also, they continuously add and change features, so it is a beta.
I suspect that beta, in this case, means that they are avoiding the hassles and complications of being accused of being a monopoly. Conspiracy anybody?
It is (at least officially) in perpetual beta state.
http://en.wikipedia.org/wiki/Perpetual_beta
its not in beta anymore since July 2009 - so if you're seeing a 'beta' logo still its because someone enabled the 'back to beta' feature. Yes really...

Getting software version numbers right. v1.0.0.1 [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I distribute software online, and always wonder if there is a proper way to better define version numbers.
Let's assume A.B.C.D in the answers. When do you increase each of the components?
Do you use any other version number tricks such as D mod 2 == 1 means it is an in house release only?
Do you have beta releases with their own version numbers, or do you have beta releases per version number?
I'm starting to like the Year.Release[.Build] convention that some apps (e.g. Perforce) use. Basically it just says the year in which you release, and the sequence within that year. So 2008.1 would be the first version, and if you released another a months or three later, it would go to 2008.2.
The advantage of this scheme is there is no implied "magnitude" of release, where you get into arguments about whether a feature is major enough to warrant a major version increment or not.
An optional extra is to tag on the build number, but that tends to be for internal purposes only (e.g. added to the EXE/DLL so you can inspect the file and ensure the right build is there).
In my opinion, almost any release number scheme can be made to work more or less sanely. The system I work on uses version numbers such as 11.50.UC3, where the U indicates 32-bit Unix, and the C3 is a minor revision (fix pack) number; other letters are used for other platform types. (I'd not recommend this scheme, but it works.)
There are a few golden rules which have not so far been stated, but which are implicit in what people have discussed.
Do not release the same version twice - once version 1.0.0 is released to anyone, it can never be re-released.
Release numbers should increase monotonically. That is, the code in version 1.0.1 or 1.1.0 or 2.0.0 should always be later than version 1.0.0, 1.0.9, or 1.4.3 (respectively).
Now, in practice, people do have to release fixes for older versions while newer versions are available -- see GCC, for example:
GCC 3.4.6 was released after 4.0.0, 4.1.0 (and AFAICR 4.2.0), but it continues the functionality of GCC 3.4.x rather than adding the extra features added to GCC 4.x.
So, you have to build your version numbering scheme carefully.
One other point which I firmly believe in:
The release version number is unrelated to the CM (VCS) system version numbering, except for trivial programs. Any serious piece of software with more than one main source file will have a version number unrelated to the version of any single file.
With SVN, you could use the SVN version number - but probably wouldn't as it changes too unpredictably.
For the stuff I work with, the version number is a purely political decision.
Incidentally, I know of software that went through releases from version 1.00 through 9.53, but that then changed to 2.80. That was a gross mistake - dictated by marketing. Granted, version 4.x of the software is/was obsolete, so it didn't immediately make for confusion, but version 5.x of the software is still in use and sold, and the revisions have already reached 3.50. I'm very worried about what my code that has to work with both the 5.x (old style) and 5.x (new style) is going to do when the inevitable conflict occurs. I guess I have to hope that they will dilly-dally on changing to 5.x until the old 5.x really is dead -- but I'm not optimistic. I also use an artificial version number, such as 9.60, to represent the 3.50 code, so that I can do sane if VERSION > 900 testing, rather than having to do: if (VERSION >= 900 || (VERSION >= 280 && VERSION < 400), where I represent version 9.00 by 900. And then there's the significant change introduced in version 3.00.xC3 -- my scheme fails to detect changes at the minor release level...grumble...grumble...
NB: Eric Raymond provides Software Release Practice HOWTO including the (linked) section on naming (numbering) releases.
I usually use D as a build counter (automatic increment by compiler)
I increment C every time a build is released to "public" (not every build is released)
A and B are used as major/minor version number and changed manually.
I think there are two ways to answer this question, and they are not entirely complimentary.
Technical: Increment versions based on technical tasks. Example: D is build number, C is Iteration, B is a minor release, A is a major release. Defining minor and major releases is really subjective, but could be related things like changes to underlying architecture.
Marketing: Increment versions based on how many "new" or "useful" features are being provided to your customers. You may also tie the version numbers to an update policy...Changes to A require the user to purchase an upgrade license, whereas other changes do not.
The bottom line, I think, is finding a model that works for you and your customers. I've seen some cases where even versions are public releases, and odd versions are considered beta, or dev releases. I've seen some products which ignore C and D all together.
Then there is the example from Micrsoft, where the only rational explanation to the version numbers for the .Net Framework is that Marketing was involved.
Our policy:
A - Significant (> 25%) changes or
additions in functionality or
interface.
B - small changes or
additions in functionality or
interface.
C - minor changes that
break the interface.
D - fixes to a
build that do not change the
interface.
People tend to want to make this much harder than it really needs to be. If your product has only a single long-lived branch, just name successive versions by their build number. If you've got some kind of "minor bug fixes are free, but you have to pay for major new versions", then use 1.0, 1.1 ... 1.n, 2.0, 2.1... etc.
If you can't immediately figure out what the A,B,C, and D in your example are, then you obviously don't need them.
The only use I have ever made of the version number was so that a customer could tell me they're using version 2.5.1.0 or whatever.
My only rule is designed to minimize mistakes in reporting that number: all four numbers have to be 1 digit only.
1.1.2.3
is ok, but
1.0.1.23
is not. Customers are likely to report both numbers (verbally, at least) as "one-one-two-three".
Auto-incrementing build numbers often results in version numbers like
1.0.1.12537
which doesn't really help, either.
A good and non-technical scheme just uses the build date in this format:
YYYY.MM.DD.BuildNumber
Where BuildNumber is either a continuous number (changelist) or just starts over at 1 each day.
Examples: 2008.03.24.1 or 2008.03.24.14503
This is mainly for internal releases, public releases would see the version printed as 2008.03 if you don't release more often than once a month. Maintenance releases get flagged as 2008.03a 2008.03b and so on. They should rarely go past "c" but if it does it's a good indicator you need better QA and/or testing procedures.
Version fields that are commonly seen by the user should be printed in a friendly "March 2008" format, reserve the more technical info in the About dialog or log files.
Biggest disadvantage: just compiling the same code on another day might change the version number. But you can avoid this by using the version control changelist as last number and checking against that to determine if the date needs to be changed as well.
In the github world, it has become popular to follow Tom Preston-Werner's "semver" spec for version numbers.
From http://semver.org/ :
Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes, MINOR version
when you add functionality in a backwards-compatible manner, and PATCH
version when you make backwards-compatible bug fixes. Additional
labels for pre-release and build metadata are available as extensions
to the MAJOR.MINOR.PATCH format.
I use V.R.M e.g. 2.5.1
V (version) changes are a major rewrite
R (revision) changes are significant new features or bug fixes
M (modification) changes are minor bux fixes (typos, etc)
I sometimes use an SVN commit number on the end too.
Its all really subjective at the end of the day and simply up to yourself/your team.
Just take a look at all the answers already - all very different.
Personally I use Major.Minor.*.* - Where Visual Studio fills in the revison/build number automatically. This is used where I work too.
I like Year.Month.Day. So, v2009.6.8 would be the "version" of this post. It is impossible to duplicate (reasonably) and it very clear when something is a newer release. You could also drop the decimals and make it v20090608.
In the case of a library, the version number tells you about the level of compatibility between two releases, and thus how difficult an upgrade will be.
A bug fix release needs to preserve binary, source, and serialization compatibility.
Minor releases mean different things to different projects, but usually they don't need to preserve source compatibility.
Major version numbers can break all three forms.
I wrote more about the rationale here.
For in-house development, we use the following format.
[Program #] . [Year] . [Month] . [Release # of this app within the month]
For example, if I'm releasing application # 15 today, and it's the third update this month, then my version # will be
15.2008.9.3
It's totally non-standard, but it is useful for us.
For the past six major versions, we've used M.0.m.b where M is the major version, m is the minor version, and b is the build number. So released versions included 6.0.2, 7.0.1, ..., up to 11.0.0. Don't ask why the second number is always 0; I've asked a number of times and nobody really knows. We haven't had a non-zero there since 5.5 was released in 1996.

Resources